Index: linux-2.6-tip/Documentation/ABI/testing/debugfs-kmemtrace =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/ABI/testing/debugfs-kmemtrace @@ -0,0 +1,71 @@ +What: /sys/kernel/debug/kmemtrace/ +Date: July 2008 +Contact: Eduard - Gabriel Munteanu +Description: + +In kmemtrace-enabled kernels, the following files are created: + +/sys/kernel/debug/kmemtrace/ + cpu (0400) Per-CPU tracing data, see below. (binary) + total_overruns (0400) Total number of bytes which were dropped from + cpu files because of full buffer condition, + non-binary. (text) + abi_version (0400) Kernel's kmemtrace ABI version. (text) + +Each per-CPU file should be read according to the relay interface. That is, +the reader should set affinity to that specific CPU and, as currently done by +the userspace application (though there are other methods), use poll() with +an infinite timeout before every read(). Otherwise, erroneous data may be +read. The binary data has the following _core_ format: + + Event ID (1 byte) Unsigned integer, one of: + 0 - represents an allocation (KMEMTRACE_EVENT_ALLOC) + 1 - represents a freeing of previously allocated memory + (KMEMTRACE_EVENT_FREE) + Type ID (1 byte) Unsigned integer, one of: + 0 - this is a kmalloc() / kfree() + 1 - this is a kmem_cache_alloc() / kmem_cache_free() + 2 - this is a __get_free_pages() et al. + Event size (2 bytes) Unsigned integer representing the + size of this event. Used to extend + kmemtrace. Discard the bytes you + don't know about. + Sequence number (4 bytes) Signed integer used to reorder data + logged on SMP machines. Wraparound + must be taken into account, although + it is unlikely. + Caller address (8 bytes) Return address to the caller. + Pointer to mem (8 bytes) Pointer to target memory area. Can be + NULL, but not all such calls might be + recorded. + +In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow: + + Requested bytes (8 bytes) Total number of requested bytes, + unsigned, must not be zero. + Allocated bytes (8 bytes) Total number of actually allocated + bytes, unsigned, must not be lower + than requested bytes. + Requested flags (4 bytes) GFP flags supplied by the caller. + Target CPU (4 bytes) Signed integer, valid for event id 1. + If equal to -1, target CPU is the same + as origin CPU, but the reverse might + not be true. + +The data is made available in the same endianness the machine has. + +Other event ids and type ids may be defined and added. Other fields may be +added by increasing event size, but see below for details. +Every modification to the ABI, including new id definitions, are followed +by bumping the ABI version by one. + +Adding new data to the packet (features) is done at the end of the mandatory +data: + Feature size (2 byte) + Feature ID (1 byte) + Feature data (Feature size - 3 bytes) + + +Users: + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git + Index: linux-2.6-tip/Documentation/DMA-API.txt =================================================================== --- linux-2.6-tip.orig/Documentation/DMA-API.txt +++ linux-2.6-tip/Documentation/DMA-API.txt @@ -609,3 +609,109 @@ size is the size (and should be a page-s The return value will be either a pointer to the processor virtual address of the memory, or an error (via PTR_ERR()) if any part of the region is occupied. + +Part III - Debug drivers use of the DMA-API +------------------------------------------- + +The DMA-API as described above as some constraints. DMA addresses must be +released with the corresponding function with the same size for example. With +the advent of hardware IOMMUs it becomes more and more important that drivers +do not violate those constraints. In the worst case such a violation can +result in data corruption up to destroyed filesystems. + +To debug drivers and find bugs in the usage of the DMA-API checking code can +be compiled into the kernel which will tell the developer about those +violations. If your architecture supports it you can select the "Enable +debugging of DMA-API usage" option in your kernel configuration. Enabling this +option has a performance impact. Do not enable it in production kernels. + +If you boot the resulting kernel will contain code which does some bookkeeping +about what DMA memory was allocated for which device. If this code detects an +error it prints a warning message with some details into your kernel log. An +example warning message may look like this: + +------------[ cut here ]------------ +WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448 + check_unmap+0x203/0x490() +Hardware name: +forcedeth 0000:00:08.0: DMA-API: device driver frees DMA memory with wrong + function [device address=0x00000000640444be] [size=66 bytes] [mapped as +single] [unmapped as page] +Modules linked in: nfsd exportfs bridge stp llc r8169 +Pid: 0, comm: swapper Tainted: G W 2.6.28-dmatest-09289-g8bb99c0 #1 +Call Trace: + [] warn_slowpath+0xf2/0x130 + [] _spin_unlock+0x10/0x30 + [] usb_hcd_link_urb_to_ep+0x75/0xc0 + [] _spin_unlock_irqrestore+0x12/0x40 + [] ohci_urb_enqueue+0x19f/0x7c0 + [] queue_work+0x56/0x60 + [] enqueue_task_fair+0x20/0x50 + [] usb_hcd_submit_urb+0x379/0xbc0 + [] cpumask_next_and+0x23/0x40 + [] find_busiest_group+0x207/0x8a0 + [] _spin_lock_irqsave+0x1f/0x50 + [] check_unmap+0x203/0x490 + [] debug_dma_unmap_page+0x49/0x50 + [] nv_tx_done_optimized+0xc6/0x2c0 + [] nv_nic_irq_optimized+0x73/0x2b0 + [] handle_IRQ_event+0x34/0x70 + [] handle_edge_irq+0xc9/0x150 + [] do_IRQ+0xcb/0x1c0 + [] ret_from_intr+0x0/0xa + <4>---[ end trace f6435a98e2a38c0e ]--- + +The driver developer can find the driver and the device including a stacktrace +of the DMA-API call which caused this warning. + +Per default only the first error will result in a warning message. All other +errors will only silently counted. This limitation exist to prevent the code +from flooding your kernel log. To support debugging a device driver this can +be disabled via debugfs. See the debugfs interface documentation below for +details. + +The debugfs directory for the DMA-API debugging code is called dma-api/. In +this directory the following files can currently be found: + + dma-api/all_errors This file contains a numeric value. If this + value is not equal to zero the debugging code + will print a warning for every error it finds + into the kernel log. Be carefull with this + option. It can easily flood your logs. + + dma-api/disabled This read-only file contains the character 'Y' + if the debugging code is disabled. This can + happen when it runs out of memory or if it was + disabled at boot time + + dma-api/error_count This file is read-only and shows the total + numbers of errors found. + + dma-api/num_errors The number in this file shows how many + warnings will be printed to the kernel log + before it stops. This number is initialized to + one at system boot and be set by writing into + this file + + dma-api/min_free_entries + This read-only file can be read to get the + minimum number of free dma_debug_entries the + allocator has ever seen. If this value goes + down to zero the code will disable itself + because it is not longer reliable. + + dma-api/num_free_entries + The current number of free dma_debug_entries + in the allocator. + +If you have this code compiled into your kernel it will be enabled by default. +If you want to boot without the bookkeeping anyway you can provide +'dma_debug=off' as a boot parameter. This will disable DMA-API debugging. +Notice that you can not enable it again at runtime. You have to reboot to do +so. + +When the code disables itself at runtime this is most likely because it ran +out of dma_debug_entries. These entries are preallocated at boot. The number +of preallocated entries is defined per architecture. If it is too low for you +boot with 'dma_debug_entries=' to overwrite the +architectural default. Index: linux-2.6-tip/Documentation/DocBook/genericirq.tmpl =================================================================== --- linux-2.6-tip.orig/Documentation/DocBook/genericirq.tmpl +++ linux-2.6-tip/Documentation/DocBook/genericirq.tmpl @@ -440,6 +440,7 @@ desc->chip->end(); used in the generic IRQ layer. !Iinclude/linux/irq.h +!Iinclude/linux/interrupt.h Index: linux-2.6-tip/Documentation/cputopology.txt =================================================================== --- linux-2.6-tip.orig/Documentation/cputopology.txt +++ linux-2.6-tip/Documentation/cputopology.txt @@ -18,11 +18,11 @@ For an architecture to support this feat these macros in include/asm-XXX/topology.h: #define topology_physical_package_id(cpu) #define topology_core_id(cpu) -#define topology_thread_siblings(cpu) -#define topology_core_siblings(cpu) +#define topology_thread_cpumask(cpu) +#define topology_core_cpumask(cpu) The type of **_id is int. -The type of siblings is cpumask_t. +The type of siblings is (const) struct cpumask *. To be consistent on all architectures, include/linux/topology.h provides default definitions for any of the above macros that are Index: linux-2.6-tip/Documentation/feature-removal-schedule.txt =================================================================== --- linux-2.6-tip.orig/Documentation/feature-removal-schedule.txt +++ linux-2.6-tip/Documentation/feature-removal-schedule.txt @@ -344,3 +344,20 @@ Why: See commits 129f8ae9b1b5be94517da76 Removal is subject to fixing any remaining bugs in ACPI which may cause the thermal throttling not to happen at the right time. Who: Dave Jones , Matthew Garrett + +----------------------------- + +What: __do_IRQ all in one fits nothing interrupt handler +When: 2.6.32 +Why: __do_IRQ was kept for easy migration to the type flow handlers. + More than two years of migration time is enough. +Who: Thomas Gleixner + +----------------------------- + +What: obsolete generic irq defines and typedefs +When: 2.6.30 +Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) + have been kept around for migration reasons. After more than two years + it's time to remove them finally +Who: Thomas Gleixner Index: linux-2.6-tip/Documentation/ftrace.txt =================================================================== --- linux-2.6-tip.orig/Documentation/ftrace.txt +++ linux-2.6-tip/Documentation/ftrace.txt @@ -15,31 +15,31 @@ Introduction Ftrace is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel. -It can be used for debugging or analyzing latencies and performance -issues that take place outside of user-space. +It can be used for debugging or analyzing latencies and +performance issues that take place outside of user-space. Although ftrace is the function tracer, it also includes an -infrastructure that allows for other types of tracing. Some of the -tracers that are currently in ftrace include a tracer to trace -context switches, the time it takes for a high priority task to -run after it was woken up, the time interrupts are disabled, and -more (ftrace allows for tracer plugins, which means that the list of -tracers can always grow). +infrastructure that allows for other types of tracing. Some of +the tracers that are currently in ftrace include a tracer to +trace context switches, the time it takes for a high priority +task to run after it was woken up, the time interrupts are +disabled, and more (ftrace allows for tracer plugins, which +means that the list of tracers can always grow). The File System --------------- -Ftrace uses the debugfs file system to hold the control files as well -as the files to display output. +Ftrace uses the debugfs file system to hold the control files as +well as the files to display output. To mount the debugfs system: # mkdir /debug # mount -t debugfs nodev /debug -(Note: it is more common to mount at /sys/kernel/debug, but for simplicity - this document will use /debug) +( Note: it is more common to mount at /sys/kernel/debug, but for + simplicity this document will use /debug) That's it! (assuming that you have ftrace configured into your kernel) @@ -50,90 +50,124 @@ of ftrace. Here is a list of some of the Note: all time values are in microseconds. - current_tracer: This is used to set or display the current tracer - that is configured. + current_tracer: - available_tracers: This holds the different types of tracers that - have been compiled into the kernel. The tracers - listed here can be configured by echoing their name - into current_tracer. - - tracing_enabled: This sets or displays whether the current_tracer - is activated and tracing or not. Echo 0 into this - file to disable the tracer or 1 to enable it. - - trace: This file holds the output of the trace in a human readable - format (described below). - - latency_trace: This file shows the same trace but the information - is organized more to display possible latencies - in the system (described below). - - trace_pipe: The output is the same as the "trace" file but this - file is meant to be streamed with live tracing. - Reads from this file will block until new data - is retrieved. Unlike the "trace" and "latency_trace" - files, this file is a consumer. This means reading - from this file causes sequential reads to display - more current data. Once data is read from this - file, it is consumed, and will not be read - again with a sequential read. The "trace" and - "latency_trace" files are static, and if the - tracer is not adding more data, they will display - the same information every time they are read. - - trace_options: This file lets the user control the amount of data - that is displayed in one of the above output - files. - - trace_max_latency: Some of the tracers record the max latency. - For example, the time interrupts are disabled. - This time is saved in this file. The max trace - will also be stored, and displayed by either - "trace" or "latency_trace". A new max trace will - only be recorded if the latency is greater than - the value in this file. (in microseconds) - - buffer_size_kb: This sets or displays the number of kilobytes each CPU - buffer can hold. The tracer buffers are the same size - for each CPU. The displayed number is the size of the - CPU buffer and not total size of all buffers. The - trace buffers are allocated in pages (blocks of memory - that the kernel uses for allocation, usually 4 KB in size). - If the last page allocated has room for more bytes - than requested, the rest of the page will be used, - making the actual allocation bigger than requested. - (Note, the size may not be a multiple of the page size due - to buffer managment overhead.) - - This can only be updated when the current_tracer - is set to "nop". - - tracing_cpumask: This is a mask that lets the user only trace - on specified CPUS. The format is a hex string - representing the CPUS. - - set_ftrace_filter: When dynamic ftrace is configured in (see the - section below "dynamic ftrace"), the code is dynamically - modified (code text rewrite) to disable calling of the - function profiler (mcount). This lets tracing be configured - in with practically no overhead in performance. This also - has a side effect of enabling or disabling specific functions - to be traced. Echoing names of functions into this file - will limit the trace to only those functions. - - set_ftrace_notrace: This has an effect opposite to that of - set_ftrace_filter. Any function that is added here will not - be traced. If a function exists in both set_ftrace_filter - and set_ftrace_notrace, the function will _not_ be traced. - - set_ftrace_pid: Have the function tracer only trace a single thread. - - available_filter_functions: This lists the functions that ftrace - has processed and can trace. These are the function - names that you can pass to "set_ftrace_filter" or - "set_ftrace_notrace". (See the section "dynamic ftrace" - below for more details.) + This is used to set or display the current tracer + that is configured. + + available_tracers: + + This holds the different types of tracers that + have been compiled into the kernel. The + tracers listed here can be configured by + echoing their name into current_tracer. + + tracing_enabled: + + This sets or displays whether the current_tracer + is activated and tracing or not. Echo 0 into this + file to disable the tracer or 1 to enable it. + + trace: + + This file holds the output of the trace in a human + readable format (described below). + + latency_trace: + + This file shows the same trace but the information + is organized more to display possible latencies + in the system (described below). + + trace_pipe: + + The output is the same as the "trace" file but this + file is meant to be streamed with live tracing. + Reads from this file will block until new data + is retrieved. Unlike the "trace" and "latency_trace" + files, this file is a consumer. This means reading + from this file causes sequential reads to display + more current data. Once data is read from this + file, it is consumed, and will not be read + again with a sequential read. The "trace" and + "latency_trace" files are static, and if the + tracer is not adding more data, they will display + the same information every time they are read. + + trace_options: + + This file lets the user control the amount of data + that is displayed in one of the above output + files. + + tracing_max_latency: + + Some of the tracers record the max latency. + For example, the time interrupts are disabled. + This time is saved in this file. The max trace + will also be stored, and displayed by either + "trace" or "latency_trace". A new max trace will + only be recorded if the latency is greater than + the value in this file. (in microseconds) + + buffer_size_kb: + + This sets or displays the number of kilobytes each CPU + buffer can hold. The tracer buffers are the same size + for each CPU. The displayed number is the size of the + CPU buffer and not total size of all buffers. The + trace buffers are allocated in pages (blocks of memory + that the kernel uses for allocation, usually 4 KB in size). + If the last page allocated has room for more bytes + than requested, the rest of the page will be used, + making the actual allocation bigger than requested. + ( Note, the size may not be a multiple of the page size + due to buffer managment overhead. ) + + This can only be updated when the current_tracer + is set to "nop". + + tracing_cpumask: + + This is a mask that lets the user only trace + on specified CPUS. The format is a hex string + representing the CPUS. + + set_ftrace_filter: + + When dynamic ftrace is configured in (see the + section below "dynamic ftrace"), the code is dynamically + modified (code text rewrite) to disable calling of the + function profiler (mcount). This lets tracing be configured + in with practically no overhead in performance. This also + has a side effect of enabling or disabling specific functions + to be traced. Echoing names of functions into this file + will limit the trace to only those functions. + + set_ftrace_notrace: + + This has an effect opposite to that of + set_ftrace_filter. Any function that is added here will not + be traced. If a function exists in both set_ftrace_filter + and set_ftrace_notrace, the function will _not_ be traced. + + set_ftrace_pid: + + Have the function tracer only trace a single thread. + + set_graph_function: + + Set a "trigger" function where tracing should start + with the function graph tracer (See the section + "dynamic ftrace" for more details). + + available_filter_functions: + + This lists the functions that ftrace + has processed and can trace. These are the function + names that you can pass to "set_ftrace_filter" or + "set_ftrace_notrace". (See the section "dynamic ftrace" + below for more details.) The Tracers @@ -141,36 +175,66 @@ The Tracers Here is the list of current tracers that may be configured. - function - function tracer that uses mcount to trace all functions. + "function" + + Function call tracer to trace all kernel functions. + + "function_graph_tracer" + + Similar to the function tracer except that the + function tracer probes the functions on their entry + whereas the function graph tracer traces on both entry + and exit of the functions. It then provides the ability + to draw a graph of function calls similar to C code + source. + + "sched_switch" + + Traces the context switches and wakeups between tasks. + + "irqsoff" + + Traces the areas that disable interrupts and saves + the trace with the longest max latency. + See tracing_max_latency. When a new max is recorded, + it replaces the old trace. It is best to view this + trace via the latency_trace file. + + "preemptoff" + + Similar to irqsoff but traces and records the amount of + time for which preemption is disabled. - sched_switch - traces the context switches between tasks. + "preemptirqsoff" - irqsoff - traces the areas that disable interrupts and saves - the trace with the longest max latency. - See tracing_max_latency. When a new max is recorded, - it replaces the old trace. It is best to view this - trace via the latency_trace file. + Similar to irqsoff and preemptoff, but traces and + records the largest time for which irqs and/or preemption + is disabled. - preemptoff - Similar to irqsoff but traces and records the amount of - time for which preemption is disabled. + "wakeup" - preemptirqsoff - Similar to irqsoff and preemptoff, but traces and - records the largest time for which irqs and/or preemption - is disabled. + Traces and records the max latency that it takes for + the highest priority task to get scheduled after + it has been woken up. - wakeup - Traces and records the max latency that it takes for - the highest priority task to get scheduled after - it has been woken up. + "hw-branch-tracer" - nop - This is not a tracer. To remove all tracers from tracing - simply echo "nop" into current_tracer. + Uses the BTS CPU feature on x86 CPUs to traces all + branches executed. + + "nop" + + This is the "trace nothing" tracer. To remove all + tracers from tracing simply echo "nop" into + current_tracer. Examples of using the tracer ---------------------------- -Here are typical examples of using the tracers when controlling them only -with the debugfs interface (without using any user-land utilities). +Here are typical examples of using the tracers when controlling +them only with the debugfs interface (without using any +user-land utilities). Output format: -------------- @@ -187,16 +251,16 @@ Here is an example of the output format bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput -------- -A header is printed with the tracer name that is represented by the trace. -In this case the tracer is "function". Then a header showing the format. Task -name "bash", the task PID "4251", the CPU that it was running on -"01", the timestamp in . format, the function name that was -traced "path_put" and the parent function that called this function -"path_walk". The timestamp is the time at which the function was -entered. +A header is printed with the tracer name that is represented by +the trace. In this case the tracer is "function". Then a header +showing the format. Task name "bash", the task PID "4251", the +CPU that it was running on "01", the timestamp in . +format, the function name that was traced "path_put" and the +parent function that called this function "path_walk". The +timestamp is the time at which the function was entered. -The sched_switch tracer also includes tracing of task wakeups and -context switches. +The sched_switch tracer also includes tracing of task wakeups +and context switches. ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S @@ -205,8 +269,8 @@ context switches. kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R -Wake ups are represented by a "+" and the context switches are shown as -"==>". The format is: +Wake ups are represented by a "+" and the context switches are +shown as "==>". The format is: Context switches: @@ -220,19 +284,20 @@ Wake ups are represented by a "+" and th :: + :: -The prio is the internal kernel priority, which is the inverse of the -priority that is usually displayed by user-space tools. Zero represents -the highest priority (99). Prio 100 starts the "nice" priorities with -100 being equal to nice -20 and 139 being nice 19. The prio "140" is -reserved for the idle task which is the lowest priority thread (pid 0). +The prio is the internal kernel priority, which is the inverse +of the priority that is usually displayed by user-space tools. +Zero represents the highest priority (99). Prio 100 starts the +"nice" priorities with 100 being equal to nice -20 and 139 being +nice 19. The prio "140" is reserved for the idle task which is +the lowest priority thread (pid 0). Latency trace format -------------------- -For traces that display latency times, the latency_trace file gives -somewhat more information to see why a latency happened. Here is a typical -trace. +For traces that display latency times, the latency_trace file +gives somewhat more information to see why a latency happened. +Here is a typical trace. # tracer: irqsoff # @@ -259,20 +324,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-r -0 0d.s1 98us : trace_hardirqs_on (do_softirq) +This shows that the current tracer is "irqsoff" tracing the time +for which interrupts were disabled. It gives the trace version +and the version of the kernel upon which this was executed on +(2.6.26-rc8). Then it displays the max latency in microsecs (97 +us). The number of trace entries displayed and the total number +recorded (both are three: #3/3). The type of preemption that was +used (PREEMPT). VP, KP, SP, and HP are always zero and are +reserved for later use. #P is the number of online CPUS (#P:2). -This shows that the current tracer is "irqsoff" tracing the time for which -interrupts were disabled. It gives the trace version and the version -of the kernel upon which this was executed on (2.6.26-rc8). Then it displays -the max latency in microsecs (97 us). The number of trace entries displayed -and the total number recorded (both are three: #3/3). The type of -preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero -and are reserved for later use. #P is the number of online CPUS (#P:2). - -The task is the process that was running when the latency occurred. -(swapper pid: 0). +The task is the process that was running when the latency +occurred. (swapper pid: 0). -The start and stop (the functions in which the interrupts were disabled and -enabled respectively) that caused the latencies: +The start and stop (the functions in which the interrupts were +disabled and enabled respectively) that caused the latencies: apic_timer_interrupt is where the interrupts were disabled. do_softirq is where they were enabled again. @@ -308,12 +373,12 @@ The above is mostly meaningful for kerne latency_trace file is relative to the start of the trace. delay: This is just to help catch your eye a bit better. And - needs to be fixed to be only relative to the same CPU. - The marks are determined by the difference between this - current trace and the next trace. - '!' - greater than preempt_mark_thresh (default 100) - '+' - greater than 1 microsecond - ' ' - less than or equal to 1 microsecond. + needs to be fixed to be only relative to the same CPU. + The marks are determined by the difference between this + current trace and the next trace. + '!' - greater than preempt_mark_thresh (default 100) + '+' - greater than 1 microsecond + ' ' - less than or equal to 1 microsecond. The rest is the same as the 'trace' file. @@ -321,14 +386,15 @@ The above is mostly meaningful for kerne trace_options ------------- -The trace_options file is used to control what gets printed in the trace -output. To see what is available, simply cat the file: +The trace_options file is used to control what gets printed in +the trace output. To see what is available, simply cat the file: cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ - noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj + noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj -To disable one of the options, echo in the option prepended with "no". +To disable one of the options, echo in the option prepended with +"no". echo noprint-parent > /debug/tracing/trace_options @@ -338,8 +404,8 @@ To enable an option, leave off the "no". Here are the available options: - print-parent - On function traces, display the calling function - as well as the function being traced. + print-parent - On function traces, display the calling (parent) + function as well as the function being traced. print-parent: bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul @@ -348,15 +414,16 @@ Here are the available options: bash-4000 [01] 1477.606694: simple_strtoul - sym-offset - Display not only the function name, but also the offset - in the function. For example, instead of seeing just - "ktime_get", you will see "ktime_get+0xb/0x20". + sym-offset - Display not only the function name, but also the + offset in the function. For example, instead of + seeing just "ktime_get", you will see + "ktime_get+0xb/0x20". sym-offset: bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 - sym-addr - this will also display the function address as well as - the function name. + sym-addr - this will also display the function address as well + as the function name. sym-addr: bash-4000 [01] 1477.606694: simple_strtoul @@ -366,35 +433,41 @@ Here are the available options: bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ (+0.000ms): simple_strtoul (strict_strtoul) - raw - This will display raw numbers. This option is best for use with - user applications that can translate the raw numbers better than - having it done in the kernel. + raw - This will display raw numbers. This option is best for + use with user applications that can translate the raw + numbers better than having it done in the kernel. - hex - Similar to raw, but the numbers will be in a hexadecimal format. + hex - Similar to raw, but the numbers will be in a hexadecimal + format. bin - This will print out the formats in raw binary. block - TBD (needs update) - stacktrace - This is one of the options that changes the trace itself. - When a trace is recorded, so is the stack of functions. - This allows for back traces of trace sites. - - userstacktrace - This option changes the trace. - It records a stacktrace of the current userspace thread. - - sym-userobj - when user stacktrace are enabled, look up which object the - address belongs to, and print a relative address - This is especially useful when ASLR is on, otherwise you don't - get a chance to resolve the address to object/file/line after the app is no - longer running + stacktrace - This is one of the options that changes the trace + itself. When a trace is recorded, so is the stack + of functions. This allows for back traces of + trace sites. + + userstacktrace - This option changes the trace. It records a + stacktrace of the current userspace thread. + + sym-userobj - when user stacktrace are enabled, look up which + object the address belongs to, and print a + relative address. This is especially useful when + ASLR is on, otherwise you don't get a chance to + resolve the address to object/file/line after + the app is no longer running - The lookup is performed when you read trace,trace_pipe,latency_trace. Example: + The lookup is performed when you read + trace,trace_pipe,latency_trace. Example: a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] - sched-tree - TBD (any users??) + sched-tree - trace all tasks that are on the runqueue, at + every scheduling event. Will add overhead if + there's a lot of tasks running at once. sched_switch @@ -431,18 +504,19 @@ of how to use it. [...] -As we have discussed previously about this format, the header shows -the name of the trace and points to the options. The "FUNCTION" -is a misnomer since here it represents the wake ups and context -switches. - -The sched_switch file only lists the wake ups (represented with '+') -and context switches ('==>') with the previous task or current task -first followed by the next task or task waking up. The format for both -of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO -is the inverse of the actual priority with zero (0) being the highest -priority and the nice values starting at 100 (nice -20). Below is -a quick chart to map the kernel priority to user land priorities. +As we have discussed previously about this format, the header +shows the name of the trace and points to the options. The +"FUNCTION" is a misnomer since here it represents the wake ups +and context switches. + +The sched_switch file only lists the wake ups (represented with +'+') and context switches ('==>') with the previous task or +current task first followed by the next task or task waking up. +The format for both of these is PID:KERNEL-PRIO:TASK-STATE. +Remember that the KERNEL-PRIO is the inverse of the actual +priority with zero (0) being the highest priority and the nice +values starting at 100 (nice -20). Below is a quick chart to map +the kernel priority to user land priorities. Kernel priority: 0 to 99 ==> user RT priority 99 to 0 Kernel priority: 100 to 139 ==> user nice -20 to 19 @@ -463,10 +537,10 @@ The task states are: ftrace_enabled -------------- -The following tracers (listed below) give different output depending -on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, -one can either use the sysctl function or set it via the proc -file system interface. +The following tracers (listed below) give different output +depending on whether or not the sysctl ftrace_enabled is set. To +set ftrace_enabled, one can either use the sysctl function or +set it via the proc file system interface. sysctl kernel.ftrace_enabled=1 @@ -474,12 +548,12 @@ file system interface. echo 1 > /proc/sys/kernel/ftrace_enabled -To disable ftrace_enabled simply replace the '1' with '0' in -the above commands. +To disable ftrace_enabled simply replace the '1' with '0' in the +above commands. -When ftrace_enabled is set the tracers will also record the functions -that are within the trace. The descriptions of the tracers -will also show an example with ftrace enabled. +When ftrace_enabled is set the tracers will also record the +functions that are within the trace. The descriptions of the +tracers will also show an example with ftrace enabled. irqsoff @@ -487,17 +561,18 @@ irqsoff When interrupts are disabled, the CPU can not react to any other external event (besides NMIs and SMIs). This prevents the timer -interrupt from triggering or the mouse interrupt from letting the -kernel know of a new mouse event. The result is a latency with the -reaction time. - -The irqsoff tracer tracks the time for which interrupts are disabled. -When a new maximum latency is hit, the tracer saves the trace leading up -to that latency point so that every time a new maximum is reached, the old -saved trace is discarded and the new trace is saved. +interrupt from triggering or the mouse interrupt from letting +the kernel know of a new mouse event. The result is a latency +with the reaction time. + +The irqsoff tracer tracks the time for which interrupts are +disabled. When a new maximum latency is hit, the tracer saves +the trace leading up to that latency point so that every time a +new maximum is reached, the old saved trace is discarded and the +new trace is saved. -To reset the maximum, echo 0 into tracing_max_latency. Here is an -example: +To reset the maximum, echo 0 into tracing_max_latency. Here is +an example: # echo irqsoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -532,10 +607,11 @@ irqsoff latency trace v1.1.5 on 2.6.26 Here we see that that we had a latency of 12 microsecs (which is -very good). The _write_lock_irq in sys_setpgid disabled interrupts. -The difference between the 12 and the displayed timestamp 14us occurred -because the clock was incremented between the time of recording the max -latency and the time of recording the function that had that latency. +very good). The _write_lock_irq in sys_setpgid disabled +interrupts. The difference between the 12 and the displayed +timestamp 14us occurred because the clock was incremented +between the time of recording the max latency and the time of +recording the function that had that latency. Note the above example had ftrace_enabled not set. If we set the ftrace_enabled, we get a much larger output: @@ -586,24 +662,24 @@ irqsoff latency trace v1.1.5 on 2.6.26-r Here we traced a 50 microsecond latency. But we also see all the -functions that were called during that time. Note that by enabling -function tracing, we incur an added overhead. This overhead may -extend the latency times. But nevertheless, this trace has provided -some very helpful debugging information. +functions that were called during that time. Note that by +enabling function tracing, we incur an added overhead. This +overhead may extend the latency times. But nevertheless, this +trace has provided some very helpful debugging information. preemptoff ---------- -When preemption is disabled, we may be able to receive interrupts but -the task cannot be preempted and a higher priority task must wait -for preemption to be enabled again before it can preempt a lower -priority task. +When preemption is disabled, we may be able to receive +interrupts but the task cannot be preempted and a higher +priority task must wait for preemption to be enabled again +before it can preempt a lower priority task. The preemptoff tracer traces the places that disable preemption. -Like the irqsoff tracer, it records the maximum latency for which preemption -was disabled. The control of preemptoff tracer is much like the irqsoff -tracer. +Like the irqsoff tracer, it records the maximum latency for +which preemption was disabled. The control of preemptoff tracer +is much like the irqsoff tracer. # echo preemptoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -637,11 +713,12 @@ preemptoff latency trace v1.1.5 on 2.6.2 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) -This has some more changes. Preemption was disabled when an interrupt -came in (notice the 'h'), and was enabled while doing a softirq. -(notice the 's'). But we also see that interrupts have been disabled -when entering the preempt off section and leaving it (the 'd'). -We do not know if interrupts were enabled in the mean time. +This has some more changes. Preemption was disabled when an +interrupt came in (notice the 'h'), and was enabled while doing +a softirq. (notice the 's'). But we also see that interrupts +have been disabled when entering the preempt off section and +leaving it (the 'd'). We do not know if interrupts were enabled +in the mean time. # tracer: preemptoff # @@ -700,28 +777,30 @@ preemptoff latency trace v1.1.5 on 2.6.2 sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq) -The above is an example of the preemptoff trace with ftrace_enabled -set. Here we see that interrupts were disabled the entire time. -The irq_enter code lets us know that we entered an interrupt 'h'. -Before that, the functions being traced still show that it is not -in an interrupt, but we can see from the functions themselves that -this is not the case. - -Notice that __do_softirq when called does not have a preempt_count. -It may seem that we missed a preempt enabling. What really happened -is that the preempt count is held on the thread's stack and we -switched to the softirq stack (4K stacks in effect). The code -does not copy the preempt count, but because interrupts are disabled, -we do not need to worry about it. Having a tracer like this is good -for letting people know what really happens inside the kernel. +The above is an example of the preemptoff trace with +ftrace_enabled set. Here we see that interrupts were disabled +the entire time. The irq_enter code lets us know that we entered +an interrupt 'h'. Before that, the functions being traced still +show that it is not in an interrupt, but we can see from the +functions themselves that this is not the case. + +Notice that __do_softirq when called does not have a +preempt_count. It may seem that we missed a preempt enabling. +What really happened is that the preempt count is held on the +thread's stack and we switched to the softirq stack (4K stacks +in effect). The code does not copy the preempt count, but +because interrupts are disabled, we do not need to worry about +it. Having a tracer like this is good for letting people know +what really happens inside the kernel. preemptirqsoff -------------- -Knowing the locations that have interrupts disabled or preemption -disabled for the longest times is helpful. But sometimes we would -like to know when either preemption and/or interrupts are disabled. +Knowing the locations that have interrupts disabled or +preemption disabled for the longest times is helpful. But +sometimes we would like to know when either preemption and/or +interrupts are disabled. Consider the following code: @@ -741,11 +820,13 @@ The preemptoff tracer will record the to call_function_with_irqs_and_preemption_off() and call_function_with_preemption_off(). -But neither will trace the time that interrupts and/or preemption -is disabled. This total time is the time that we can not schedule. -To record this time, use the preemptirqsoff tracer. +But neither will trace the time that interrupts and/or +preemption is disabled. This total time is the time that we can +not schedule. To record this time, use the preemptirqsoff +tracer. -Again, using this trace is much like the irqsoff and preemptoff tracers. +Again, using this trace is much like the irqsoff and preemptoff +tracers. # echo preemptirqsoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -781,9 +862,10 @@ preemptirqsoff latency trace v1.1.5 on 2 The trace_hardirqs_off_thunk is called from assembly on x86 when -interrupts are disabled in the assembly code. Without the function -tracing, we do not know if interrupts were enabled within the preemption -points. We do see that it started with preemption enabled. +interrupts are disabled in the assembly code. Without the +function tracing, we do not know if interrupts were enabled +within the preemption points. We do see that it started with +preemption enabled. Here is a trace with ftrace_enabled set: @@ -871,40 +953,42 @@ preemptirqsoff latency trace v1.1.5 on 2 sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq) -This is a very interesting trace. It started with the preemption of -the ls task. We see that the task had the "need_resched" bit set -via the 'N' in the trace. Interrupts were disabled before the spin_lock -at the beginning of the trace. We see that a schedule took place to run -sshd. When the interrupts were enabled, we took an interrupt. -On return from the interrupt handler, the softirq ran. We took another -interrupt while running the softirq as we see from the capital 'H'. +This is a very interesting trace. It started with the preemption +of the ls task. We see that the task had the "need_resched" bit +set via the 'N' in the trace. Interrupts were disabled before +the spin_lock at the beginning of the trace. We see that a +schedule took place to run sshd. When the interrupts were +enabled, we took an interrupt. On return from the interrupt +handler, the softirq ran. We took another interrupt while +running the softirq as we see from the capital 'H'. wakeup ------ -In a Real-Time environment it is very important to know the wakeup -time it takes for the highest priority task that is woken up to the -time that it executes. This is also known as "schedule latency". -I stress the point that this is about RT tasks. It is also important -to know the scheduling latency of non-RT tasks, but the average -schedule latency is better for non-RT tasks. Tools like -LatencyTop are more appropriate for such measurements. +In a Real-Time environment it is very important to know the +wakeup time it takes for the highest priority task that is woken +up to the time that it executes. This is also known as "schedule +latency". I stress the point that this is about RT tasks. It is +also important to know the scheduling latency of non-RT tasks, +but the average schedule latency is better for non-RT tasks. +Tools like LatencyTop are more appropriate for such +measurements. Real-Time environments are interested in the worst case latency. -That is the longest latency it takes for something to happen, and -not the average. We can have a very fast scheduler that may only -have a large latency once in a while, but that would not work well -with Real-Time tasks. The wakeup tracer was designed to record -the worst case wakeups of RT tasks. Non-RT tasks are not recorded -because the tracer only records one worst case and tracing non-RT -tasks that are unpredictable will overwrite the worst case latency -of RT tasks. - -Since this tracer only deals with RT tasks, we will run this slightly -differently than we did with the previous tracers. Instead of performing -an 'ls', we will run 'sleep 1' under 'chrt' which changes the -priority of the task. +That is the longest latency it takes for something to happen, +and not the average. We can have a very fast scheduler that may +only have a large latency once in a while, but that would not +work well with Real-Time tasks. The wakeup tracer was designed +to record the worst case wakeups of RT tasks. Non-RT tasks are +not recorded because the tracer only records one worst case and +tracing non-RT tasks that are unpredictable will overwrite the +worst case latency of RT tasks. + +Since this tracer only deals with RT tasks, we will run this +slightly differently than we did with the previous tracers. +Instead of performing an 'ls', we will run 'sleep 1' under +'chrt' which changes the priority of the task. # echo wakeup > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -934,17 +1018,16 @@ wakeup latency trace v1.1.5 on 2.6.26-rc -0 1d..4 4us : schedule (cpu_idle) - -Running this on an idle system, we see that it only took 4 microseconds -to perform the task switch. Note, since the trace marker in the -schedule is before the actual "switch", we stop the tracing when -the recorded task is about to schedule in. This may change if -we add a new marker at the end of the scheduler. - -Notice that the recorded task is 'sleep' with the PID of 4901 and it -has an rt_prio of 5. This priority is user-space priority and not -the internal kernel priority. The policy is 1 for SCHED_FIFO and 2 -for SCHED_RR. +Running this on an idle system, we see that it only took 4 +microseconds to perform the task switch. Note, since the trace +marker in the schedule is before the actual "switch", we stop +the tracing when the recorded task is about to schedule in. This +may change if we add a new marker at the end of the scheduler. + +Notice that the recorded task is 'sleep' with the PID of 4901 +and it has an rt_prio of 5. This priority is user-space priority +and not the internal kernel priority. The policy is 1 for +SCHED_FIFO and 2 for SCHED_RR. Doing the same with chrt -r 5 and ftrace_enabled set. @@ -1001,24 +1084,25 @@ ksoftirq-7 1d..6 49us : _spin_unlo ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) ksoftirq-7 1d..4 50us : schedule (__cond_resched) -The interrupt went off while running ksoftirqd. This task runs at -SCHED_OTHER. Why did not we see the 'N' set early? This may be -a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks -configured, the interrupt and softirq run with their own stack. -Some information is held on the top of the task's stack (need_resched -and preempt_count are both stored there). The setting of the NEED_RESCHED -bit is done directly to the task's stack, but the reading of the -NEED_RESCHED is done by looking at the current stack, which in this case -is the stack for the hard interrupt. This hides the fact that NEED_RESCHED -has been set. We do not see the 'N' until we switch back to the task's +The interrupt went off while running ksoftirqd. This task runs +at SCHED_OTHER. Why did not we see the 'N' set early? This may +be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K +stacks configured, the interrupt and softirq run with their own +stack. Some information is held on the top of the task's stack +(need_resched and preempt_count are both stored there). The +setting of the NEED_RESCHED bit is done directly to the task's +stack, but the reading of the NEED_RESCHED is done by looking at +the current stack, which in this case is the stack for the hard +interrupt. This hides the fact that NEED_RESCHED has been set. +We do not see the 'N' until we switch back to the task's assigned stack. function -------- This tracer is the function tracer. Enabling the function tracer -can be done from the debug file system. Make sure the ftrace_enabled is -set; otherwise this tracer is a nop. +can be done from the debug file system. Make sure the +ftrace_enabled is set; otherwise this tracer is a nop. # sysctl kernel.ftrace_enabled=1 # echo function > /debug/tracing/current_tracer @@ -1048,14 +1132,15 @@ set; otherwise this tracer is a nop. [...] -Note: function tracer uses ring buffers to store the above entries. -The newest data may overwrite the oldest data. Sometimes using echo to -stop the trace is not sufficient because the tracing could have overwritten -the data that you wanted to record. For this reason, it is sometimes better to -disable tracing directly from a program. This allows you to stop the -tracing at the point that you hit the part that you are interested in. -To disable the tracing directly from a C program, something like following -code snippet can be used: +Note: function tracer uses ring buffers to store the above +entries. The newest data may overwrite the oldest data. +Sometimes using echo to stop the trace is not sufficient because +the tracing could have overwritten the data that you wanted to +record. For this reason, it is sometimes better to disable +tracing directly from a program. This allows you to stop the +tracing at the point that you hit the part that you are +interested in. To disable the tracing directly from a C program, +something like following code snippet can be used: int trace_fd; [...] @@ -1070,10 +1155,10 @@ int main(int argc, char *argv[]) { } Note: Here we hard coded the path name. The debugfs mount is not -guaranteed to be at /debug (and is more commonly at /sys/kernel/debug). -For simple one time traces, the above is sufficent. For anything else, -a search through /proc/mounts may be needed to find where the debugfs -file-system is mounted. +guaranteed to be at /debug (and is more commonly at +/sys/kernel/debug). For simple one time traces, the above is +sufficent. For anything else, a search through /proc/mounts may +be needed to find where the debugfs file-system is mounted. Single thread tracing @@ -1152,49 +1237,297 @@ int main (int argc, char **argv) return 0; } + +hw-branch-tracer (x86 only) +--------------------------- + +This tracer uses the x86 last branch tracing hardware feature to +collect a branch trace on all cpus with relatively low overhead. + +The tracer uses a fixed-size circular buffer per cpu and only +traces ring 0 branches. The trace file dumps that buffer in the +following format: + +# tracer: hw-branch-tracer +# +# CPU# TO <- FROM + 0 scheduler_tick+0xb5/0x1bf <- task_tick_idle+0x5/0x6 + 2 run_posix_cpu_timers+0x2b/0x72a <- run_posix_cpu_timers+0x25/0x72a + 0 scheduler_tick+0x139/0x1bf <- scheduler_tick+0xed/0x1bf + 0 scheduler_tick+0x17c/0x1bf <- scheduler_tick+0x148/0x1bf + 2 run_posix_cpu_timers+0x9e/0x72a <- run_posix_cpu_timers+0x5e/0x72a + 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf + + +The tracer may be used to dump the trace for the oops'ing cpu on +a kernel oops into the system log. To enable this, +ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one +can either use the sysctl function or set it via the proc system +interface. + + sysctl kernel.ftrace_dump_on_oops=1 + +or + + echo 1 > /proc/sys/kernel/ftrace_dump_on_oops + + +Here's an example of such a dump after a null pointer +dereference in a kernel module: + +[57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 +[57848.106019] IP: [] open+0x6/0x14 [oops] +[57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0 +[57848.106019] Oops: 0002 [#1] SMP +[57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus +[57848.106019] Dumping ftrace buffer: +[57848.106019] --------------------------------- +[...] +[57848.106019] 0 chrdev_open+0xe6/0x165 <- cdev_put+0x23/0x24 +[57848.106019] 0 chrdev_open+0x117/0x165 <- chrdev_open+0xfa/0x165 +[57848.106019] 0 chrdev_open+0x120/0x165 <- chrdev_open+0x11c/0x165 +[57848.106019] 0 chrdev_open+0x134/0x165 <- chrdev_open+0x12b/0x165 +[57848.106019] 0 open+0x0/0x14 [oops] <- chrdev_open+0x144/0x165 +[57848.106019] 0 page_fault+0x0/0x30 <- open+0x6/0x14 [oops] +[57848.106019] 0 error_entry+0x0/0x5b <- page_fault+0x4/0x30 +[57848.106019] 0 error_kernelspace+0x0/0x31 <- error_entry+0x59/0x5b +[57848.106019] 0 error_sti+0x0/0x1 <- error_kernelspace+0x2d/0x31 +[57848.106019] 0 page_fault+0x9/0x30 <- error_sti+0x0/0x1 +[57848.106019] 0 do_page_fault+0x0/0x881 <- page_fault+0x1a/0x30 +[...] +[57848.106019] 0 do_page_fault+0x66b/0x881 <- is_prefetch+0x1ee/0x1f2 +[57848.106019] 0 do_page_fault+0x6e0/0x881 <- do_page_fault+0x67a/0x881 +[57848.106019] 0 oops_begin+0x0/0x96 <- do_page_fault+0x6e0/0x881 +[57848.106019] 0 trace_hw_branch_oops+0x0/0x2d <- oops_begin+0x9/0x96 +[...] +[57848.106019] 0 ds_suspend_bts+0x2a/0xe3 <- ds_suspend_bts+0x1a/0xe3 +[57848.106019] --------------------------------- +[57848.106019] CPU 0 +[57848.106019] Modules linked in: oops +[57848.106019] Pid: 5542, comm: cat Tainted: G W 2.6.28 #23 +[57848.106019] RIP: 0010:[] [] open+0x6/0x14 [oops] +[57848.106019] RSP: 0018:ffff880235457d48 EFLAGS: 00010246 +[...] + + +function graph tracer +--------------------------- + +This tracer is similar to the function tracer except that it +probes a function on its entry and its exit. This is done by +using a dynamically allocated stack of return addresses in each +task_struct. On function entry the tracer overwrites the return +address of each function traced to set a custom probe. Thus the +original return address is stored on the stack of return address +in the task_struct. + +Probing on both ends of a function leads to special features +such as: + +- measure of a function's time execution +- having a reliable call stack to draw function calls graph + +This tracer is useful in several situations: + +- you want to find the reason of a strange kernel behavior and + need to see what happens in detail on any areas (or specific + ones). + +- you are experiencing weird latencies but it's difficult to + find its origin. + +- you want to find quickly which path is taken by a specific + function + +- you just want to peek inside a working kernel and want to see + what happens there. + +# tracer: function_graph +# +# CPU DURATION FUNCTION CALLS +# | | | | | | | + + 0) | sys_open() { + 0) | do_sys_open() { + 0) | getname() { + 0) | kmem_cache_alloc() { + 0) 1.382 us | __might_sleep(); + 0) 2.478 us | } + 0) | strncpy_from_user() { + 0) | might_fault() { + 0) 1.389 us | __might_sleep(); + 0) 2.553 us | } + 0) 3.807 us | } + 0) 7.876 us | } + 0) | alloc_fd() { + 0) 0.668 us | _spin_lock(); + 0) 0.570 us | expand_files(); + 0) 0.586 us | _spin_unlock(); + + +There are several columns that can be dynamically +enabled/disabled. You can use every combination of options you +want, depending on your needs. + +- The cpu number on which the function executed is default + enabled. It is sometimes better to only trace one cpu (see + tracing_cpu_mask file) or you might sometimes see unordered + function calls while cpu tracing switch. + + hide: echo nofuncgraph-cpu > /debug/tracing/trace_options + show: echo funcgraph-cpu > /debug/tracing/trace_options + +- The duration (function's time of execution) is displayed on + the closing bracket line of a function or on the same line + than the current function in case of a leaf one. It is default + enabled. + + hide: echo nofuncgraph-duration > /debug/tracing/trace_options + show: echo funcgraph-duration > /debug/tracing/trace_options + +- The overhead field precedes the duration field in case of + reached duration thresholds. + + hide: echo nofuncgraph-overhead > /debug/tracing/trace_options + show: echo funcgraph-overhead > /debug/tracing/trace_options + depends on: funcgraph-duration + + ie: + + 0) | up_write() { + 0) 0.646 us | _spin_lock_irqsave(); + 0) 0.684 us | _spin_unlock_irqrestore(); + 0) 3.123 us | } + 0) 0.548 us | fput(); + 0) + 58.628 us | } + + [...] + + 0) | putname() { + 0) | kmem_cache_free() { + 0) 0.518 us | __phys_addr(); + 0) 1.757 us | } + 0) 2.861 us | } + 0) ! 115.305 us | } + 0) ! 116.402 us | } + + + means that the function exceeded 10 usecs. + ! means that the function exceeded 100 usecs. + + +- The task/pid field displays the thread cmdline and pid which + executed the function. It is default disabled. + + hide: echo nofuncgraph-proc > /debug/tracing/trace_options + show: echo funcgraph-proc > /debug/tracing/trace_options + + ie: + + # tracer: function_graph + # + # CPU TASK/PID DURATION FUNCTION CALLS + # | | | | | | | | | + 0) sh-4802 | | d_free() { + 0) sh-4802 | | call_rcu() { + 0) sh-4802 | | __call_rcu() { + 0) sh-4802 | 0.616 us | rcu_process_gp_end(); + 0) sh-4802 | 0.586 us | check_for_new_grace_period(); + 0) sh-4802 | 2.899 us | } + 0) sh-4802 | 4.040 us | } + 0) sh-4802 | 5.151 us | } + 0) sh-4802 | + 49.370 us | } + + +- The absolute time field is an absolute timestamp given by the + system clock since it started. A snapshot of this time is + given on each entry/exit of functions + + hide: echo nofuncgraph-abstime > /debug/tracing/trace_options + show: echo funcgraph-abstime > /debug/tracing/trace_options + + ie: + + # + # TIME CPU DURATION FUNCTION CALLS + # | | | | | | | | + 360.774522 | 1) 0.541 us | } + 360.774522 | 1) 4.663 us | } + 360.774523 | 1) 0.541 us | __wake_up_bit(); + 360.774524 | 1) 6.796 us | } + 360.774524 | 1) 7.952 us | } + 360.774525 | 1) 9.063 us | } + 360.774525 | 1) 0.615 us | journal_mark_dirty(); + 360.774527 | 1) 0.578 us | __brelse(); + 360.774528 | 1) | reiserfs_prepare_for_journal() { + 360.774528 | 1) | unlock_buffer() { + 360.774529 | 1) | wake_up_bit() { + 360.774529 | 1) | bit_waitqueue() { + 360.774530 | 1) 0.594 us | __phys_addr(); + + +You can put some comments on specific functions by using +trace_printk() For example, if you want to put a comment inside +the __might_sleep() function, you just have to include + and call trace_printk() inside __might_sleep() + +trace_printk("I'm a comment!\n") + +will produce: + + 1) | __might_sleep() { + 1) | /* I'm a comment! */ + 1) 1.449 us | } + + +You might find other useful features for this tracer in the +following "dynamic ftrace" section such as tracing only specific +functions or tasks. + dynamic ftrace -------------- If CONFIG_DYNAMIC_FTRACE is set, the system will run with virtually no overhead when function tracing is disabled. The way this works is the mcount function call (placed at the start of -every kernel function, produced by the -pg switch in gcc), starts -of pointing to a simple return. (Enabling FTRACE will include the --pg switch in the compiling of the kernel.) +every kernel function, produced by the -pg switch in gcc), +starts of pointing to a simple return. (Enabling FTRACE will +include the -pg switch in the compiling of the kernel.) At compile time every C file object is run through the recordmcount.pl script (located in the scripts directory). This script will process the C object using objdump to find all the -locations in the .text section that call mcount. (Note, only -the .text section is processed, since processing other sections -like .init.text may cause races due to those sections being freed). - -A new section called "__mcount_loc" is created that holds references -to all the mcount call sites in the .text section. This section is -compiled back into the original object. The final linker will add -all these references into a single table. +locations in the .text section that call mcount. (Note, only the +.text section is processed, since processing other sections like +.init.text may cause races due to those sections being freed). + +A new section called "__mcount_loc" is created that holds +references to all the mcount call sites in the .text section. +This section is compiled back into the original object. The +final linker will add all these references into a single table. On boot up, before SMP is initialized, the dynamic ftrace code -scans this table and updates all the locations into nops. It also -records the locations, which are added to the available_filter_functions -list. Modules are processed as they are loaded and before they are -executed. When a module is unloaded, it also removes its functions from -the ftrace function list. This is automatic in the module unload -code, and the module author does not need to worry about it. - -When tracing is enabled, kstop_machine is called to prevent races -with the CPUS executing code being modified (which can cause the -CPU to do undesireable things), and the nops are patched back -to calls. But this time, they do not call mcount (which is just -a function stub). They now call into the ftrace infrastructure. +scans this table and updates all the locations into nops. It +also records the locations, which are added to the +available_filter_functions list. Modules are processed as they +are loaded and before they are executed. When a module is +unloaded, it also removes its functions from the ftrace function +list. This is automatic in the module unload code, and the +module author does not need to worry about it. + +When tracing is enabled, kstop_machine is called to prevent +races with the CPUS executing code being modified (which can +cause the CPU to do undesireable things), and the nops are +patched back to calls. But this time, they do not call mcount +(which is just a function stub). They now call into the ftrace +infrastructure. One special side-effect to the recording of the functions being traced is that we can now selectively choose which functions we -wish to trace and which ones we want the mcount calls to remain as -nops. +wish to trace and which ones we want the mcount calls to remain +as nops. -Two files are used, one for enabling and one for disabling the tracing -of specified functions. They are: +Two files are used, one for enabling and one for disabling the +tracing of specified functions. They are: set_ftrace_filter @@ -1202,8 +1535,8 @@ and set_ftrace_notrace -A list of available functions that you can add to these files is listed -in: +A list of available functions that you can add to these files is +listed in: available_filter_functions @@ -1240,8 +1573,8 @@ hrtimer_interrupt sys_nanosleep -Perhaps this is not enough. The filters also allow simple wild cards. -Only the following are currently available +Perhaps this is not enough. The filters also allow simple wild +cards. Only the following are currently available * - will match functions that begin with * - will match functions that end with @@ -1251,9 +1584,9 @@ These are the only wild cards which are * will not work. -Note: It is better to use quotes to enclose the wild cards, otherwise - the shell may expand the parameters into names of files in the local - directory. +Note: It is better to use quotes to enclose the wild cards, + otherwise the shell may expand the parameters into names + of files in the local directory. # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter @@ -1299,7 +1632,8 @@ This is because the '>' and '>>' act jus To rewrite the filters, use '>' To append to the filters, use '>>' -To clear out a filter so that all functions will be recorded again: +To clear out a filter so that all functions will be recorded +again: # echo > /debug/tracing/set_ftrace_filter # cat /debug/tracing/set_ftrace_filter @@ -1331,7 +1665,8 @@ hrtimer_get_res hrtimer_init_sleeper -The set_ftrace_notrace prevents those functions from being traced. +The set_ftrace_notrace prevents those functions from being +traced. # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace @@ -1353,13 +1688,75 @@ Produces: We can see that there's no more lock or preempt tracing. + +Dynamic ftrace with the function graph tracer +--------------------------------------------- + +Although what has been explained above concerns both the +function tracer and the function-graph-tracer, there are some +special features only available in the function-graph tracer. + +If you want to trace only one function and all of its children, +you just have to echo its name into set_graph_function: + + echo __do_fault > set_graph_function + +will produce the following "expanded" trace of the __do_fault() +function: + + 0) | __do_fault() { + 0) | filemap_fault() { + 0) | find_lock_page() { + 0) 0.804 us | find_get_page(); + 0) | __might_sleep() { + 0) 1.329 us | } + 0) 3.904 us | } + 0) 4.979 us | } + 0) 0.653 us | _spin_lock(); + 0) 0.578 us | page_add_file_rmap(); + 0) 0.525 us | native_set_pte_at(); + 0) 0.585 us | _spin_unlock(); + 0) | unlock_page() { + 0) 0.541 us | page_waitqueue(); + 0) 0.639 us | __wake_up_bit(); + 0) 2.786 us | } + 0) + 14.237 us | } + 0) | __do_fault() { + 0) | filemap_fault() { + 0) | find_lock_page() { + 0) 0.698 us | find_get_page(); + 0) | __might_sleep() { + 0) 1.412 us | } + 0) 3.950 us | } + 0) 5.098 us | } + 0) 0.631 us | _spin_lock(); + 0) 0.571 us | page_add_file_rmap(); + 0) 0.526 us | native_set_pte_at(); + 0) 0.586 us | _spin_unlock(); + 0) | unlock_page() { + 0) 0.533 us | page_waitqueue(); + 0) 0.638 us | __wake_up_bit(); + 0) 2.793 us | } + 0) + 14.012 us | } + +You can also expand several functions at once: + + echo sys_open > set_graph_function + echo sys_close >> set_graph_function + +Now if you want to go back to trace all functions you can clear +this special filter via: + + echo > set_graph_function + + trace_pipe ---------- -The trace_pipe outputs the same content as the trace file, but the effect -on the tracing is different. Every read from trace_pipe is consumed. -This means that subsequent reads will be different. The trace -is live. +The trace_pipe outputs the same content as the trace file, but +the effect on the tracing is different. Every read from +trace_pipe is consumed. This means that subsequent reads will be +different. The trace is live. # echo function > /debug/tracing/current_tracer # cat /debug/tracing/trace_pipe > /tmp/trace.out & @@ -1387,38 +1784,45 @@ is live. bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up -Note, reading the trace_pipe file will block until more input is added. -By changing the tracer, trace_pipe will issue an EOF. We needed -to set the function tracer _before_ we "cat" the trace_pipe file. +Note, reading the trace_pipe file will block until more input is +added. By changing the tracer, trace_pipe will issue an EOF. We +needed to set the function tracer _before_ we "cat" the +trace_pipe file. trace entries ------------- -Having too much or not enough data can be troublesome in diagnosing -an issue in the kernel. The file buffer_size_kb is used to modify -the size of the internal trace buffers. The number listed -is the number of entries that can be recorded per CPU. To know -the full size, multiply the number of possible CPUS with the -number of entries. +Having too much or not enough data can be troublesome in +diagnosing an issue in the kernel. The file buffer_size_kb is +used to modify the size of the internal trace buffers. The +number listed is the number of entries that can be recorded per +CPU. To know the full size, multiply the number of possible CPUS +with the number of entries. # cat /debug/tracing/buffer_size_kb 1408 (units kilobytes) -Note, to modify this, you must have tracing completely disabled. To do that, -echo "nop" into the current_tracer. If the current_tracer is not set -to "nop", an EINVAL error will be returned. +Note, to modify this, you must have tracing completely disabled. +To do that, echo "nop" into the current_tracer. If the +current_tracer is not set to "nop", an EINVAL error will be +returned. # echo nop > /debug/tracing/current_tracer # echo 10000 > /debug/tracing/buffer_size_kb # cat /debug/tracing/buffer_size_kb 10000 (units kilobytes) -The number of pages which will be allocated is limited to a percentage -of available memory. Allocating too much will produce an error. +The number of pages which will be allocated is limited to a +percentage of available memory. Allocating too much will produce +an error. # echo 1000000000000 > /debug/tracing/buffer_size_kb -bash: echo: write error: Cannot allocate memory # cat /debug/tracing/buffer_size_kb 85 +----------- + +More details can be found in the source code, in the +kernel/tracing/*.c files. Index: linux-2.6-tip/Documentation/kernel-parameters.txt =================================================================== --- linux-2.6-tip.orig/Documentation/kernel-parameters.txt +++ linux-2.6-tip/Documentation/kernel-parameters.txt @@ -49,6 +49,7 @@ parameter is applicable: ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. JOY Appropriate joystick support is enabled. + KMEMTRACE kmemtrace is enabled. LIBATA Libata driver is enabled LP Printer support is enabled. LOOP Loopback device support is enabled. @@ -491,11 +492,23 @@ and is between 256 and 4096 characters. Range: 0 - 8192 Default: 64 + dma_debug=off If the kernel is compiled with DMA_API_DEBUG support + this option disables the debugging code at boot. + + dma_debug_entries= + This option allows to tune the number of preallocated + entries for DMA-API debugging code. One entry is + required per DMA-API allocation. Use this if the + DMA-API debugging code disables itself because the + architectural default is too low. + hpet= [X86-32,HPET] option to control HPET usage - Format: { enable (default) | disable | force } + Format: { enable (default) | disable | force | + verbose } disable: disable HPET and use PIT instead force: allow force enabled of undocumented chips (ICH4, VIA, nVidia) + verbose: show contents of HPET registers during setup com20020= [HW,NET] ARCnet - COM20020 chipset Format: @@ -604,6 +617,9 @@ and is between 256 and 4096 characters. debug_objects [KNL] Enable object debugging + no_debug_objects + [KNL] Disable object debugging + debugpat [X86] Enable PAT debugging decnet.addr= [HW,NET] @@ -1047,6 +1063,15 @@ and is between 256 and 4096 characters. use the HighMem zone if it exists, and the Normal zone if it does not. + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } + Controls whether kmemtrace is enabled + at boot-time. + + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of + subbufs kmemtrace's relay channel has. Set this + higher than default (KMEMTRACE_N_SUBBUFS in code) if + you experience buffer overruns. + movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter is similar to kernelcore except it specifies the amount of memory used for migratable allocations. @@ -1310,8 +1335,13 @@ and is between 256 and 4096 characters. memtest= [KNL,X86] Enable memtest Format: - range: 0,4 : pattern number default : 0 + Specifies the number of memtest passes to be + performed. Each pass selects another test + pattern from a given set of patterns. Memtest + fills the memory with this pattern, validates + memory contents and reserves bad memory + regions that are detected. meye.*= [HW] Set MotionEye Camera parameters See Documentation/video4linux/meye.txt. @@ -2329,6 +2359,8 @@ and is between 256 and 4096 characters. tp720= [HW,PS2] + trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size. + trix= [HW,OSS] MediaTrix AudioTrix Pro Format: ,,,,,,,, Index: linux-2.6-tip/Documentation/kmemcheck.txt =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/kmemcheck.txt @@ -0,0 +1,129 @@ +Contents +======== + + 1. How to use + 2. Technical description + 3. Changes to the slab allocators + 4. Problems + 5. Parameters + 6. Future enhancements + + +How to use (IMPORTANT) +====================== + +Always remember this: kmemcheck _will_ give false positives. So don't enable +it and spam the mailing list with its reports; you are not going to be heard, +and it will make people's skins thicker for when the real errors are found. + +Instead, I encourage maintainers and developers to find errors in _their_ +_own_ code. And if you find false positives, you can try to work around them, +try to figure out if it's a real bug or not, or simply ignore them. Most +developers know their own code and will quickly and efficiently determine the +root cause of a kmemcheck report. This is therefore also the most efficient +way to work with kmemcheck. + +If you still want to run kmemcheck to inspect others' code, the rule of thumb +should be: If it's not obvious (to you), don't tell us about it either. Most +likely the code is correct and you'll only waste our time. If you can work +out the error, please do send the maintainer a heads up and/or a patch, but +don't expect him/her to fix something that wasn't wrong in the first place. + + +Technical description +===================== + +kmemcheck works by marking memory pages non-present. This means that whenever +somebody attempts to access the page, a page fault is generated. The page +fault handler notices that the page was in fact only hidden, and so it calls +on the kmemcheck code to make further investigations. + +When the investigations are completed, kmemcheck "shows" the page by marking +it present (as it would be under normal circumstances). This way, the +interrupted code can continue as usual. + +But after the instruction has been executed, we should hide the page again, so +that we can catch the next access too! Now kmemcheck makes use of a debugging +feature of the processor, namely single-stepping. When the processor has +finished the one instruction that generated the memory access, a debug +exception is raised. From here, we simply hide the page again and continue +execution, this time with the single-stepping feature turned off. + + +Changes to the slab allocators +============================== + +kmemcheck requires some assistance from the memory allocator in order to work. +The memory allocator needs to + +1. Tell kmemcheck about newly allocated pages and pages that are about to + be freed. This allows kmemcheck to set up and tear down the shadow memory + for the pages in question. The shadow memory stores the status of each byte + in the allocation proper, e.g. whether it is initialized or uninitialized. +2. Tell kmemcheck which parts of memory should be marked uninitialized. There + are actually a few more states, such as "not yet allocated" and "recently + freed". + +If a slab cache is set up using the SLAB_NOTRACK flag, it will never return +memory that can take page faults because of kmemcheck. + +If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still +request memory with the __GFP_NOTRACK flag. This does not prevent the page +faults from occurring, however, but marks the object in question as being +initialized so that no warnings will ever be produced for this object. + +Currently, the SLAB and SLUB allocators are supported by kmemcheck. + + +Problems +======== + +The most prominent problem seems to be that of bit-fields. kmemcheck can only +track memory with byte granularity. Therefore, when gcc generates code to +access only one bit in a bit-field, there is really no way for kmemcheck to +know which of the other bits will be used or thrown away. Consequently, there +may be bogus warnings for bit-field accesses. We have added a "bitfields" API +to get around this problem. See include/linux/kmemcheck.h for detailed +instructions! + + +Parameters +========== + +In addition to enabling CONFIG_KMEMCHECK before the kernel is compiled, the +parameter kmemcheck=1 must be passed to the kernel when it is started in order +to actually do the tracking. So by default, there is only a very small +(probably negligible) overhead for enabling the config option. + +Similarly, kmemcheck may be turned on or off at run-time using, respectively: + +echo 1 > /proc/sys/kernel/kmemcheck + and +echo 0 > /proc/sys/kernel/kmemcheck + +Note that this is a lazy setting; once turned off, the old allocations will +still have to take a single page fault exception before tracking is turned off +for that particular page. Enabling kmemcheck on will only enable tracking for +allocations made from that point onwards. + +The default mode is the one-shot mode, where only the first error is reported +before kmemcheck is disabled. This mode can be enabled by passing kmemcheck=2 +to the kernel at boot, or running + +echo 2 > /proc/sys/kernel/kmemcheck + +when the kernel is already running. + + +Future enhancements +=================== + +There is already some preliminary support for catching use-after-free errors. +What still needs to be done is delaying kfree() so that memory is not +reallocated immediately after freeing it. [Suggested by Pekka Enberg.] + +It should be possible to allow SMP systems by duplicating the page tables for +each processor in the system. This is probably extremely difficult, however. +[Suggested by Ingo Molnar.] + +Support for instruction set extensions like XMM, SSE2, etc. Index: linux-2.6-tip/Documentation/lockdep-design.txt =================================================================== --- linux-2.6-tip.orig/Documentation/lockdep-design.txt +++ linux-2.6-tip/Documentation/lockdep-design.txt @@ -27,33 +27,37 @@ lock-class. State ----- -The validator tracks lock-class usage history into 5 separate state bits: +The validator tracks lock-class usage history into 4n + 1 separate state bits: -- 'ever held in hardirq context' [ == hardirq-safe ] -- 'ever held in softirq context' [ == softirq-safe ] -- 'ever held with hardirqs enabled' [ == hardirq-unsafe ] -- 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ] +- 'ever held in STATE context' +- 'ever head as readlock in STATE context' +- 'ever head with STATE enabled' +- 'ever head as readlock with STATE enabled' + +Where STATE can be either one of (kernel/lockdep_states.h) + - hardirq + - softirq + - reclaim_fs - 'ever used' [ == !unused ] -When locking rules are violated, these 4 state bits are presented in the -locking error messages, inside curlies. A contrived example: +When locking rules are violated, these state bits are presented in the +locking error messages, inside curlies. A contrived example: modprobe/2287 is trying to acquire lock: - (&sio_locks[i].lock){--..}, at: [] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-...}, at: [] mutex_lock+0x21/0x24 but task is already holding lock: - (&sio_locks[i].lock){--..}, at: [] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-...}, at: [] mutex_lock+0x21/0x24 -The bit position indicates hardirq, softirq, hardirq-read, -softirq-read respectively, and the character displayed in each -indicates: +The bit position indicates STATE, STATE-read, for each of the states listed +above, and the character displayed in each indicates: '.' acquired while irqs disabled '+' acquired in irq context '-' acquired with irqs enabled - '?' read acquired in irq context with irqs enabled. + '?' acquired in irq context with irqs enabled. Unused mutexes cannot be part of the cause of an error. Index: linux-2.6-tip/Documentation/perf_counter/Makefile =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/perf_counter/Makefile @@ -0,0 +1,12 @@ +BINS = kerneltop perfstat + +all: $(BINS) + +kerneltop: kerneltop.c ../../include/linux/perf_counter.h + cc -O6 -Wall -lrt -o $@ $< + +perfstat: kerneltop + ln -sf kerneltop perfstat + +clean: + rm $(BINS) Index: linux-2.6-tip/Documentation/perf_counter/design.txt =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/perf_counter/design.txt @@ -0,0 +1,283 @@ + +Performance Counters for Linux +------------------------------ + +Performance counters are special hardware registers available on most modern +CPUs. These registers count the number of certain types of hw events: such +as instructions executed, cachemisses suffered, or branches mis-predicted - +without slowing down the kernel or applications. These registers can also +trigger interrupts when a threshold number of events have passed - and can +thus be used to profile the code that runs on that CPU. + +The Linux Performance Counter subsystem provides an abstraction of these +hardware capabilities. It provides per task and per CPU counters, counter +groups, and it provides event capabilities on top of those. It +provides "virtual" 64-bit counters, regardless of the width of the +underlying hardware counters. + +Performance counters are accessed via special file descriptors. +There's one file descriptor per virtual counter used. + +The special file descriptor is opened via the perf_counter_open() +system call: + + int sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr, + pid_t pid, int cpu, int group_fd, + unsigned long flags); + +The syscall returns the new fd. The fd can be used via the normal +VFS system calls: read() can be used to read the counter, fcntl() +can be used to set the blocking mode, etc. + +Multiple counters can be kept open at a time, and the counters +can be poll()ed. + +When creating a new counter fd, 'perf_counter_hw_event' is: + +/* + * Event to monitor via a performance monitoring counter: + */ +struct perf_counter_hw_event { + __u64 event_config; + + __u64 irq_period; + __u64 record_type; + __u64 read_format; + + __u64 disabled : 1, /* off by default */ + nmi : 1, /* NMI sampling */ + inherit : 1, /* children inherit it */ + pinned : 1, /* must always be on PMU */ + exclusive : 1, /* only group on PMU */ + exclude_user : 1, /* don't count user */ + exclude_kernel : 1, /* ditto kernel */ + exclude_hv : 1, /* ditto hypervisor */ + exclude_idle : 1, /* don't count when idle */ + + __reserved_1 : 55; + + __u32 extra_config_len; + + __u32 __reserved_4; + __u64 __reserved_2; + __u64 __reserved_3; +}; + +The 'event_config' field specifies what the counter should count. It +is divided into 3 bit-fields: + +raw_type: 1 bit (most significant bit) 0x8000_0000_0000_0000 +type: 7 bits (next most significant) 0x7f00_0000_0000_0000 +event_id: 56 bits (least significant) 0x00ff_0000_0000_0000 + +If 'raw_type' is 1, then the counter will count a hardware event +specified by the remaining 63 bits of event_config. The encoding is +machine-specific. + +If 'raw_type' is 0, then the 'type' field says what kind of counter +this is, with the following encoding: + +enum perf_event_types { + PERF_TYPE_HARDWARE = 0, + PERF_TYPE_SOFTWARE = 1, + PERF_TYPE_TRACEPOINT = 2, +}; + +A counter of PERF_TYPE_HARDWARE will count the hardware event +specified by 'event_id': + +/* + * Generalized performance counter event types, used by the hw_event.event_id + * parameter of the sys_perf_counter_open() syscall: + */ +enum hw_event_ids { + /* + * Common hardware events, generalized by the kernel: + */ + PERF_COUNT_CPU_CYCLES = 0, + PERF_COUNT_INSTRUCTIONS = 1, + PERF_COUNT_CACHE_REFERENCES = 2, + PERF_COUNT_CACHE_MISSES = 3, + PERF_COUNT_BRANCH_INSTRUCTIONS = 4, + PERF_COUNT_BRANCH_MISSES = 5, + PERF_COUNT_BUS_CYCLES = 6, +}; + +These are standardized types of events that work relatively uniformly +on all CPUs that implement Performance Counters support under Linux, +although there may be variations (e.g., different CPUs might count +cache references and misses at different levels of the cache hierarchy). +If a CPU is not able to count the selected event, then the system call +will return -EINVAL. + +More hw_event_types are supported as well, but they are CPU-specific +and accessed as raw events. For example, to count "External bus +cycles while bus lock signal asserted" events on Intel Core CPUs, pass +in a 0x4064 event_id value and set hw_event.raw_type to 1. + +A counter of type PERF_TYPE_SOFTWARE will count one of the available +software events, selected by 'event_id': + +/* + * Special "software" counters provided by the kernel, even if the hardware + * does not support performance counters. These counters measure various + * physical and sw events of the kernel (and allow the profiling of them as + * well): + */ +enum sw_event_ids { + PERF_COUNT_CPU_CLOCK = 0, + PERF_COUNT_TASK_CLOCK = 1, + PERF_COUNT_PAGE_FAULTS = 2, + PERF_COUNT_CONTEXT_SWITCHES = 3, + PERF_COUNT_CPU_MIGRATIONS = 4, + PERF_COUNT_PAGE_FAULTS_MIN = 5, + PERF_COUNT_PAGE_FAULTS_MAJ = 6, +}; + +Counters come in two flavours: counting counters and sampling +counters. A "counting" counter is one that is used for counting the +number of events that occur, and is characterised by having +irq_period = 0 and record_type = PERF_RECORD_SIMPLE. A read() on a +counting counter simply returns the current value of the counter as +an 8-byte number. + +A "sampling" counter is one that is set up to generate an interrupt +every N events, where N is given by 'irq_period'. A sampling counter +has irq_period > 0 and record_type != PERF_RECORD_SIMPLE. The +record_type controls what data is recorded on each interrupt, and the +available values are currently: + +/* + * IRQ-notification data record type: + */ +enum perf_counter_record_type { + PERF_RECORD_SIMPLE = 0, + PERF_RECORD_IRQ = 1, + PERF_RECORD_GROUP = 2, +}; + +A record_type value of PERF_RECORD_IRQ will record the instruction +pointer (IP) at which the interrupt occurred. A record_type value of +PERF_RECORD_GROUP will record the event_config and counter value of +all of the other counters in the group, and should only be used on a +group leader (see below). Currently these two values are mutually +exclusive, but record_type will become a bit-mask in future and +support other values. + +A sampling counter has an event queue, into which an event is placed +on each interrupt. A read() on a sampling counter will read the next +event from the event queue. If the queue is empty, the read() will +either block or return an EAGAIN error, depending on whether the fd +has been set to non-blocking mode or not. + +The 'disabled' bit specifies whether the counter starts out disabled +or enabled. If it is initially disabled, it can be enabled by ioctl +or prctl (see below). + +The 'nmi' bit specifies, for hardware events, whether the counter +should be set up to request non-maskable interrupts (NMIs) or normal +interrupts. This bit is ignored if the user doesn't have +CAP_SYS_ADMIN privilege (i.e. is not root) or if the CPU doesn't +generate NMIs from hardware counters. + +The 'inherit' bit, if set, specifies that this counter should count +events on descendant tasks as well as the task specified. This only +applies to new descendents, not to any existing descendents at the +time the counter is created (nor to any new descendents of existing +descendents). + +The 'pinned' bit, if set, specifies that the counter should always be +on the CPU if at all possible. It only applies to hardware counters +and only to group leaders. If a pinned counter cannot be put onto the +CPU (e.g. because there are not enough hardware counters or because of +a conflict with some other event), then the counter goes into an +'error' state, where reads return end-of-file (i.e. read() returns 0) +until the counter is subsequently enabled or disabled. + +The 'exclusive' bit, if set, specifies that when this counter's group +is on the CPU, it should be the only group using the CPU's counters. +In future, this will allow sophisticated monitoring programs to supply +extra configuration information via 'extra_config_len' to exploit +advanced features of the CPU's Performance Monitor Unit (PMU) that are +not otherwise accessible and that might disrupt other hardware +counters. + +The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a +way to request that counting of events be restricted to times when the +CPU is in user, kernel and/or hypervisor mode. + + +The 'pid' parameter to the perf_counter_open() system call allows the +counter to be specific to a task: + + pid == 0: if the pid parameter is zero, the counter is attached to the + current task. + + pid > 0: the counter is attached to a specific task (if the current task + has sufficient privilege to do so) + + pid < 0: all tasks are counted (per cpu counters) + +The 'cpu' parameter allows a counter to be made specific to a CPU: + + cpu >= 0: the counter is restricted to a specific CPU + cpu == -1: the counter counts on all CPUs + +(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.) + +A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts +events of that task and 'follows' that task to whatever CPU the task +gets schedule to. Per task counters can be created by any user, for +their own tasks. + +A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts +all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege. + +The 'flags' parameter is currently unused and must be zero. + +The 'group_fd' parameter allows counter "groups" to be set up. A +counter group has one counter which is the group "leader". The leader +is created first, with group_fd = -1 in the perf_counter_open call +that creates it. The rest of the group members are created +subsequently, with group_fd giving the fd of the group leader. +(A single counter on its own is created with group_fd = -1 and is +considered to be a group with only 1 member.) + +A counter group is scheduled onto the CPU as a unit, that is, it will +only be put onto the CPU if all of the counters in the group can be +put onto the CPU. This means that the values of the member counters +can be meaningfully compared, added, divided (to get ratios), etc., +with each other, since they have counted events for the same set of +executed instructions. + +Counters can be enabled and disabled in two ways: via ioctl and via +prctl. When a counter is disabled, it doesn't count or generate +events but does continue to exist and maintain its count value. + +An individual counter or counter group can be enabled with + + ioctl(fd, PERF_COUNTER_IOC_ENABLE); + +or disabled with + + ioctl(fd, PERF_COUNTER_IOC_DISABLE); + +Enabling or disabling the leader of a group enables or disables the +whole group; that is, while the group leader is disabled, none of the +counters in the group will count. Enabling or disabling a member of a +group other than the leader only affects that counter - disabling an +non-leader stops that counter from counting but doesn't affect any +other counter. + +A process can enable or disable all the counter groups that are +attached to it, using prctl: + + prctl(PR_TASK_PERF_COUNTERS_ENABLE); + + prctl(PR_TASK_PERF_COUNTERS_DISABLE); + +This applies to all counters on the current process, whether created +by this process or by another, and doesn't affect any counters that +this process has created on other processes. It only enables or +disables the group leaders, not any other members in the groups. + Index: linux-2.6-tip/Documentation/perf_counter/kerneltop.c =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/perf_counter/kerneltop.c @@ -0,0 +1,1328 @@ +/* + * kerneltop.c: show top kernel functions - performance counters showcase + + Build with: + + cc -O6 -Wall -c -o kerneltop.o kerneltop.c -lrt + + Sample output: + +------------------------------------------------------------------------------ + KernelTop: 2669 irqs/sec [NMI, cache-misses/cache-refs], (all, cpu: 2) +------------------------------------------------------------------------------ + + weight RIP kernel function + ______ ________________ _______________ + + 35.20 - ffffffff804ce74b : skb_copy_and_csum_dev + 33.00 - ffffffff804cb740 : sock_alloc_send_skb + 31.26 - ffffffff804ce808 : skb_push + 22.43 - ffffffff80510004 : tcp_established_options + 19.00 - ffffffff8027d250 : find_get_page + 15.76 - ffffffff804e4fc9 : eth_type_trans + 15.20 - ffffffff804d8baa : dst_release + 14.86 - ffffffff804cf5d8 : skb_release_head_state + 14.00 - ffffffff802217d5 : read_hpet + 12.00 - ffffffff804ffb7f : __ip_local_out + 11.97 - ffffffff804fc0c8 : ip_local_deliver_finish + 8.54 - ffffffff805001a3 : ip_queue_xmit + */ + +/* + * perfstat: /usr/bin/time -alike performance counter statistics utility + + It summarizes the counter events of all tasks (and child tasks), + covering all CPUs that the command (or workload) executes on. + It only counts the per-task events of the workload started, + independent of how many other tasks run on those CPUs. + + Sample output: + + $ ./perfstat -e 1 -e 3 -e 5 ls -lR /usr/include/ >/dev/null + + Performance counter stats for 'ls': + + 163516953 instructions + 2295 cache-misses + 2855182 branch-misses + */ + + /* + * Copyright (C) 2008, Red Hat Inc, Ingo Molnar + * + * Improvements and fixes by: + * + * Arjan van de Ven + * Yanmin Zhang + * Wu Fengguang + * Mike Galbraith + * Paul Mackerras + * + * Released under the GPL v2. (and only v2, not any later version) + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "../../include/linux/perf_counter.h" + + +/* + * prctl(PR_TASK_PERF_COUNTERS_DISABLE) will (cheaply) disable all + * counters in the current task. + */ +#define PR_TASK_PERF_COUNTERS_DISABLE 31 +#define PR_TASK_PERF_COUNTERS_ENABLE 32 + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) + +#define rdclock() \ +({ \ + struct timespec ts; \ + \ + clock_gettime(CLOCK_MONOTONIC, &ts); \ + ts.tv_sec * 1000000000ULL + ts.tv_nsec; \ +}) + +/* + * Pick up some kernel type conventions: + */ +#define __user +#define asmlinkage + +#ifdef __x86_64__ +#define __NR_perf_counter_open 298 +#define rmb() asm volatile("lfence" ::: "memory") +#define cpu_relax() asm volatile("rep; nop" ::: "memory"); +#endif + +#ifdef __i386__ +#define __NR_perf_counter_open 336 +#define rmb() asm volatile("lfence" ::: "memory") +#define cpu_relax() asm volatile("rep; nop" ::: "memory"); +#endif + +#ifdef __powerpc__ +#define __NR_perf_counter_open 319 +#define rmb() asm volatile ("sync" ::: "memory") +#define cpu_relax() asm volatile ("" ::: "memory"); +#endif + +#define unlikely(x) __builtin_expect(!!(x), 0) +#define min(x, y) ({ \ + typeof(x) _min1 = (x); \ + typeof(y) _min2 = (y); \ + (void) (&_min1 == &_min2); \ + _min1 < _min2 ? _min1 : _min2; }) + +asmlinkage int sys_perf_counter_open( + struct perf_counter_hw_event *hw_event_uptr __user, + pid_t pid, + int cpu, + int group_fd, + unsigned long flags) +{ + return syscall( + __NR_perf_counter_open, hw_event_uptr, pid, cpu, group_fd, flags); +} + +#define MAX_COUNTERS 64 +#define MAX_NR_CPUS 256 + +#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id)) + +static int run_perfstat = 0; +static int system_wide = 0; + +static int nr_counters = 0; +static __u64 event_id[MAX_COUNTERS] = { + EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), + EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), + EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), + EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), + + EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), + EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), + EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), + EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), +}; +static int default_interval = 100000; +static int event_count[MAX_COUNTERS]; +static int fd[MAX_NR_CPUS][MAX_COUNTERS]; + +static __u64 count_filter = 100; + +static int tid = -1; +static int profile_cpu = -1; +static int nr_cpus = 0; +static int nmi = 1; +static int group = 0; +static unsigned int page_size; +static unsigned int mmap_pages = 16; + +static char *vmlinux; + +static char *sym_filter; +static unsigned long filter_start; +static unsigned long filter_end; + +static int delay_secs = 2; +static int zero; +static int dump_symtab; + +struct source_line { + uint64_t EIP; + unsigned long count; + char *line; + struct source_line *next; +}; + +static struct source_line *lines; +static struct source_line **lines_tail; + +const unsigned int default_count[] = { + 1000000, + 1000000, + 10000, + 10000, + 1000000, + 10000, +}; + +static char *hw_event_names[] = { + "CPU cycles", + "instructions", + "cache references", + "cache misses", + "branches", + "branch misses", + "bus cycles", +}; + +static char *sw_event_names[] = { + "cpu clock ticks", + "task clock ticks", + "pagefaults", + "context switches", + "CPU migrations", + "minor faults", + "major faults", +}; + +struct event_symbol { + __u64 event; + char *symbol; +}; + +static struct event_symbol event_symbols[] = { + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", }, + {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", }, + + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", }, + {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", }, +}; + +#define __PERF_COUNTER_FIELD(config, name) \ + ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT) + +#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW) +#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG) +#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE) +#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT) + +static void display_events_help(void) +{ + unsigned int i; + __u64 e; + + printf( + " -e EVENT --event=EVENT # symbolic-name abbreviations"); + + for (i = 0; i < ARRAY_SIZE(event_symbols); i++) { + int type, id; + + e = event_symbols[i].event; + type = PERF_COUNTER_TYPE(e); + id = PERF_COUNTER_ID(e); + + printf("\n %d:%d: %-20s", + type, id, event_symbols[i].symbol); + } + + printf("\n" + " rNNN: raw PMU events (eventsel+umask)\n\n"); +} + +static void display_perfstat_help(void) +{ + printf( + "Usage: perfstat [] \n\n" + "PerfStat Options (up to %d event types can be specified):\n\n", + MAX_COUNTERS); + + display_events_help(); + + printf( + " -a # system-wide collection\n"); + exit(0); +} + +static void display_help(void) +{ + if (run_perfstat) + return display_perfstat_help(); + + printf( + "Usage: kerneltop []\n" + " Or: kerneltop -S [] COMMAND [ARGS]\n\n" + "KernelTop Options (up to %d event types can be specified at once):\n\n", + MAX_COUNTERS); + + display_events_help(); + + printf( + " -S --stat # perfstat COMMAND\n" + " -a # system-wide collection (for perfstat)\n\n" + " -c CNT --count=CNT # event period to sample\n\n" + " -C CPU --cpu=CPU # CPU (-1 for all) [default: -1]\n" + " -p PID --pid=PID # PID of sampled task (-1 for all) [default: -1]\n\n" + " -d delay --delay= # sampling/display delay [default: 2]\n" + " -f CNT --filter=CNT # min-event-count filter [default: 100]\n\n" + " -s symbol --symbol= # function to be showed annotated one-shot\n" + " -x path --vmlinux= # the vmlinux binary, required for -s use\n" + " -z --zero # zero counts after display\n" + " -D --dump_symtab # dump symbol table to stderr on startup\n" + " -m pages --mmap_pages= # number of mmap data pages\n" + ); + + exit(0); +} + +static char *event_name(int ctr) +{ + __u64 config = event_id[ctr]; + int type = PERF_COUNTER_TYPE(config); + int id = PERF_COUNTER_ID(config); + static char buf[32]; + + if (PERF_COUNTER_RAW(config)) { + sprintf(buf, "raw 0x%llx", PERF_COUNTER_CONFIG(config)); + return buf; + } + + switch (type) { + case PERF_TYPE_HARDWARE: + if (id < PERF_HW_EVENTS_MAX) + return hw_event_names[id]; + return "unknown-hardware"; + + case PERF_TYPE_SOFTWARE: + if (id < PERF_SW_EVENTS_MAX) + return sw_event_names[id]; + return "unknown-software"; + + default: + break; + } + + return "unknown"; +} + +/* + * Each event can have multiple symbolic names. + * Symbolic names are (almost) exactly matched. + */ +static __u64 match_event_symbols(char *str) +{ + __u64 config, id; + int type; + unsigned int i; + + if (sscanf(str, "r%llx", &config) == 1) + return config | PERF_COUNTER_RAW_MASK; + + if (sscanf(str, "%d:%llu", &type, &id) == 2) + return EID(type, id); + + for (i = 0; i < ARRAY_SIZE(event_symbols); i++) { + if (!strncmp(str, event_symbols[i].symbol, + strlen(event_symbols[i].symbol))) + return event_symbols[i].event; + } + + return ~0ULL; +} + +static int parse_events(char *str) +{ + __u64 config; + +again: + if (nr_counters == MAX_COUNTERS) + return -1; + + config = match_event_symbols(str); + if (config == ~0ULL) + return -1; + + event_id[nr_counters] = config; + nr_counters++; + + str = strstr(str, ","); + if (str) { + str++; + goto again; + } + + return 0; +} + + +/* + * perfstat + */ + +char fault_here[1000000]; + +static void create_perfstat_counter(int counter) +{ + struct perf_counter_hw_event hw_event; + + memset(&hw_event, 0, sizeof(hw_event)); + hw_event.config = event_id[counter]; + hw_event.record_type = PERF_RECORD_SIMPLE; + hw_event.nmi = 0; + + if (system_wide) { + int cpu; + for (cpu = 0; cpu < nr_cpus; cpu ++) { + fd[cpu][counter] = sys_perf_counter_open(&hw_event, -1, cpu, -1, 0); + if (fd[cpu][counter] < 0) { + printf("perfstat error: syscall returned with %d (%s)\n", + fd[cpu][counter], strerror(errno)); + exit(-1); + } + } + } else { + hw_event.inherit = 1; + hw_event.disabled = 1; + + fd[0][counter] = sys_perf_counter_open(&hw_event, 0, -1, -1, 0); + if (fd[0][counter] < 0) { + printf("perfstat error: syscall returned with %d (%s)\n", + fd[0][counter], strerror(errno)); + exit(-1); + } + } +} + +int do_perfstat(int argc, char *argv[]) +{ + unsigned long long t0, t1; + int counter; + ssize_t res; + int status; + int pid; + + if (!system_wide) + nr_cpus = 1; + + for (counter = 0; counter < nr_counters; counter++) + create_perfstat_counter(counter); + + argc -= optind; + argv += optind; + + if (!argc) + display_help(); + + /* + * Enable counters and exec the command: + */ + t0 = rdclock(); + prctl(PR_TASK_PERF_COUNTERS_ENABLE); + + if ((pid = fork()) < 0) + perror("failed to fork"); + if (!pid) { + if (execvp(argv[0], argv)) { + perror(argv[0]); + exit(-1); + } + } + while (wait(&status) >= 0) + ; + prctl(PR_TASK_PERF_COUNTERS_DISABLE); + t1 = rdclock(); + + fflush(stdout); + + fprintf(stderr, "\n"); + fprintf(stderr, " Performance counter stats for \'%s\':\n", + argv[0]); + fprintf(stderr, "\n"); + + for (counter = 0; counter < nr_counters; counter++) { + int cpu; + __u64 count, single_count; + + count = 0; + for (cpu = 0; cpu < nr_cpus; cpu ++) { + res = read(fd[cpu][counter], + (char *) &single_count, sizeof(single_count)); + assert(res == sizeof(single_count)); + count += single_count; + } + + if (event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK) || + event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK)) { + + double msecs = (double)count / 1000000; + + fprintf(stderr, " %14.6f %-20s (msecs)\n", + msecs, event_name(counter)); + } else { + fprintf(stderr, " %14Ld %-20s (events)\n", + count, event_name(counter)); + } + } + fprintf(stderr, "\n"); + fprintf(stderr, " Wall-clock time elapsed: %12.6f msecs\n", + (double)(t1-t0)/1e6); + fprintf(stderr, "\n"); + + return 0; +} + +/* + * Symbols + */ + +static uint64_t min_ip; +static uint64_t max_ip = -1ll; + +struct sym_entry { + unsigned long long addr; + char *sym; + unsigned long count[MAX_COUNTERS]; + int skip; + struct source_line *source; +}; + +#define MAX_SYMS 100000 + +static int sym_table_count; + +struct sym_entry *sym_filter_entry; + +static struct sym_entry sym_table[MAX_SYMS]; + +static void show_details(struct sym_entry *sym); + +/* + * Ordering weight: count-1 * count-2 * ... / count-n + */ +static double sym_weight(const struct sym_entry *sym) +{ + double weight; + int counter; + + weight = sym->count[0]; + + for (counter = 1; counter < nr_counters-1; counter++) + weight *= sym->count[counter]; + + weight /= (sym->count[counter] + 1); + + return weight; +} + +static int compare(const void *__sym1, const void *__sym2) +{ + const struct sym_entry *sym1 = __sym1, *sym2 = __sym2; + + return sym_weight(sym1) < sym_weight(sym2); +} + +static time_t last_refresh; +static long events; +static long userspace_events; +static const char CONSOLE_CLEAR[] = ""; + +static struct sym_entry tmp[MAX_SYMS]; + +static void print_sym_table(void) +{ + int i, printed; + int counter; + float events_per_sec = events/delay_secs; + float kevents_per_sec = (events-userspace_events)/delay_secs; + + memcpy(tmp, sym_table, sizeof(sym_table[0])*sym_table_count); + qsort(tmp, sym_table_count, sizeof(tmp[0]), compare); + + write(1, CONSOLE_CLEAR, strlen(CONSOLE_CLEAR)); + + printf( +"------------------------------------------------------------------------------\n"); + printf( " KernelTop:%8.0f irqs/sec kernel:%3.1f%% [%s, ", + events_per_sec, + 100.0 - (100.0*((events_per_sec-kevents_per_sec)/events_per_sec)), + nmi ? "NMI" : "IRQ"); + + if (nr_counters == 1) + printf("%d ", event_count[0]); + + for (counter = 0; counter < nr_counters; counter++) { + if (counter) + printf("/"); + + printf("%s", event_name(counter)); + } + + printf( "], "); + + if (tid != -1) + printf(" (tid: %d", tid); + else + printf(" (all"); + + if (profile_cpu != -1) + printf(", cpu: %d)\n", profile_cpu); + else { + if (tid != -1) + printf(")\n"); + else + printf(", %d CPUs)\n", nr_cpus); + } + + printf("------------------------------------------------------------------------------\n\n"); + + if (nr_counters == 1) + printf(" events"); + else + printf(" weight events"); + + printf(" RIP kernel function\n" + " ______ ______ ________________ _______________\n\n" + ); + + printed = 0; + for (i = 0; i < sym_table_count; i++) { + int count; + + if (nr_counters == 1) { + if (printed <= 18 && + tmp[i].count[0] >= count_filter) { + printf("%19.2f - %016llx : %s\n", + sym_weight(tmp + i), tmp[i].addr, tmp[i].sym); + printed++; + } + } else { + if (printed <= 18 && + tmp[i].count[0] >= count_filter) { + printf("%8.1f %10ld - %016llx : %s\n", + sym_weight(tmp + i), + tmp[i].count[0], + tmp[i].addr, tmp[i].sym); + printed++; + } + } + /* + * Add decay to the counts: + */ + for (count = 0; count < nr_counters; count++) + sym_table[i].count[count] = zero ? 0 : sym_table[i].count[count] * 7 / 8; + } + + if (sym_filter_entry) + show_details(sym_filter_entry); + + last_refresh = time(NULL); + + { + struct pollfd stdin_poll = { .fd = 0, .events = POLLIN }; + + if (poll(&stdin_poll, 1, 0) == 1) { + printf("key pressed - exiting.\n"); + exit(0); + } + } +} + +static int read_symbol(FILE *in, struct sym_entry *s) +{ + static int filter_match = 0; + char *sym, stype; + char str[500]; + int rc, pos; + + rc = fscanf(in, "%llx %c %499s", &s->addr, &stype, str); + if (rc == EOF) + return -1; + + assert(rc == 3); + + /* skip until end of line: */ + pos = strlen(str); + do { + rc = fgetc(in); + if (rc == '\n' || rc == EOF || pos >= 499) + break; + str[pos] = rc; + pos++; + } while (1); + str[pos] = 0; + + sym = str; + + /* Filter out known duplicates and non-text symbols. */ + if (!strcmp(sym, "_text")) + return 1; + if (!min_ip && !strcmp(sym, "_stext")) + return 1; + if (!strcmp(sym, "_etext") || !strcmp(sym, "_sinittext")) + return 1; + if (stype != 'T' && stype != 't') + return 1; + if (!strncmp("init_module", sym, 11) || !strncmp("cleanup_module", sym, 14)) + return 1; + if (strstr(sym, "_text_start") || strstr(sym, "_text_end")) + return 1; + + s->sym = malloc(strlen(str)); + assert(s->sym); + + strcpy((char *)s->sym, str); + s->skip = 0; + + /* Tag events to be skipped. */ + if (!strcmp("default_idle", s->sym) || !strcmp("cpu_idle", s->sym)) + s->skip = 1; + else if (!strcmp("enter_idle", s->sym) || !strcmp("exit_idle", s->sym)) + s->skip = 1; + else if (!strcmp("mwait_idle", s->sym)) + s->skip = 1; + + if (filter_match == 1) { + filter_end = s->addr; + filter_match = -1; + if (filter_end - filter_start > 10000) { + printf("hm, too large filter symbol <%s> - skipping.\n", + sym_filter); + printf("symbol filter start: %016lx\n", filter_start); + printf(" end: %016lx\n", filter_end); + filter_end = filter_start = 0; + sym_filter = NULL; + sleep(1); + } + } + if (filter_match == 0 && sym_filter && !strcmp(s->sym, sym_filter)) { + filter_match = 1; + filter_start = s->addr; + } + + return 0; +} + +int compare_addr(const void *__sym1, const void *__sym2) +{ + const struct sym_entry *sym1 = __sym1, *sym2 = __sym2; + + return sym1->addr > sym2->addr; +} + +static void sort_symbol_table(void) +{ + int i, dups; + + do { + qsort(sym_table, sym_table_count, sizeof(sym_table[0]), compare_addr); + for (i = 0, dups = 0; i < sym_table_count; i++) { + if (sym_table[i].addr == sym_table[i+1].addr) { + sym_table[i+1].addr = -1ll; + dups++; + } + } + sym_table_count -= dups; + } while(dups); +} + +static void parse_symbols(void) +{ + struct sym_entry *last; + + FILE *kallsyms = fopen("/proc/kallsyms", "r"); + + if (!kallsyms) { + printf("Could not open /proc/kallsyms - no CONFIG_KALLSYMS_ALL=y?\n"); + exit(-1); + } + + while (!feof(kallsyms)) { + if (read_symbol(kallsyms, &sym_table[sym_table_count]) == 0) { + sym_table_count++; + assert(sym_table_count <= MAX_SYMS); + } + } + + sort_symbol_table(); + min_ip = sym_table[0].addr; + max_ip = sym_table[sym_table_count-1].addr; + last = sym_table + sym_table_count++; + + last->addr = -1ll; + last->sym = ""; + + if (filter_end) { + int count; + for (count=0; count < sym_table_count; count ++) { + if (!strcmp(sym_table[count].sym, sym_filter)) { + sym_filter_entry = &sym_table[count]; + break; + } + } + } + if (dump_symtab) { + int i; + + for (i = 0; i < sym_table_count; i++) + fprintf(stderr, "%llx %s\n", + sym_table[i].addr, sym_table[i].sym); + } +} + +/* + * Source lines + */ + +static void parse_vmlinux(char *filename) +{ + FILE *file; + char command[PATH_MAX*2]; + if (!filename) + return; + + sprintf(command, "objdump --start-address=0x%016lx --stop-address=0x%016lx -dS %s", filter_start, filter_end, filename); + + file = popen(command, "r"); + if (!file) + return; + + lines_tail = &lines; + while (!feof(file)) { + struct source_line *src; + size_t dummy = 0; + char *c; + + src = malloc(sizeof(struct source_line)); + assert(src != NULL); + memset(src, 0, sizeof(struct source_line)); + + if (getline(&src->line, &dummy, file) < 0) + break; + if (!src->line) + break; + + c = strchr(src->line, '\n'); + if (c) + *c = 0; + + src->next = NULL; + *lines_tail = src; + lines_tail = &src->next; + + if (strlen(src->line)>8 && src->line[8] == ':') + src->EIP = strtoull(src->line, NULL, 16); + if (strlen(src->line)>8 && src->line[16] == ':') + src->EIP = strtoull(src->line, NULL, 16); + } + pclose(file); +} + +static void record_precise_ip(uint64_t ip) +{ + struct source_line *line; + + for (line = lines; line; line = line->next) { + if (line->EIP == ip) + line->count++; + if (line->EIP > ip) + break; + } +} + +static void lookup_sym_in_vmlinux(struct sym_entry *sym) +{ + struct source_line *line; + char pattern[PATH_MAX]; + sprintf(pattern, "<%s>:", sym->sym); + + for (line = lines; line; line = line->next) { + if (strstr(line->line, pattern)) { + sym->source = line; + break; + } + } +} + +static void show_lines(struct source_line *line_queue, int line_queue_count) +{ + int i; + struct source_line *line; + + line = line_queue; + for (i = 0; i < line_queue_count; i++) { + printf("%8li\t%s\n", line->count, line->line); + line = line->next; + } +} + +#define TRACE_COUNT 3 + +static void show_details(struct sym_entry *sym) +{ + struct source_line *line; + struct source_line *line_queue = NULL; + int displayed = 0; + int line_queue_count = 0; + + if (!sym->source) + lookup_sym_in_vmlinux(sym); + if (!sym->source) + return; + + printf("Showing details for %s\n", sym->sym); + + line = sym->source; + while (line) { + if (displayed && strstr(line->line, ">:")) + break; + + if (!line_queue_count) + line_queue = line; + line_queue_count ++; + + if (line->count >= count_filter) { + show_lines(line_queue, line_queue_count); + line_queue_count = 0; + line_queue = NULL; + } else if (line_queue_count > TRACE_COUNT) { + line_queue = line_queue->next; + line_queue_count --; + } + + line->count = 0; + displayed++; + if (displayed > 300) + break; + line = line->next; + } +} + +/* + * Binary search in the histogram table and record the hit: + */ +static void record_ip(uint64_t ip, int counter) +{ + int left_idx, middle_idx, right_idx, idx; + unsigned long left, middle, right; + + record_precise_ip(ip); + + left_idx = 0; + right_idx = sym_table_count-1; + assert(ip <= max_ip && ip >= min_ip); + + while (left_idx + 1 < right_idx) { + middle_idx = (left_idx + right_idx) / 2; + + left = sym_table[ left_idx].addr; + middle = sym_table[middle_idx].addr; + right = sym_table[ right_idx].addr; + + if (!(left <= middle && middle <= right)) { + printf("%016lx...\n%016lx...\n%016lx\n", left, middle, right); + printf("%d %d %d\n", left_idx, middle_idx, right_idx); + } + assert(left <= middle && middle <= right); + if (!(left <= ip && ip <= right)) { + printf(" left: %016lx\n", left); + printf(" ip: %016lx\n", (unsigned long)ip); + printf("right: %016lx\n", right); + } + assert(left <= ip && ip <= right); + /* + * [ left .... target .... middle .... right ] + * => right := middle + */ + if (ip < middle) { + right_idx = middle_idx; + continue; + } + /* + * [ left .... middle ... target ... right ] + * => left := middle + */ + left_idx = middle_idx; + } + + idx = left_idx; + + if (!sym_table[idx].skip) + sym_table[idx].count[counter]++; + else events--; +} + +static void process_event(uint64_t ip, int counter) +{ + events++; + + if (ip < min_ip || ip > max_ip) { + userspace_events++; + return; + } + + record_ip(ip, counter); +} + +static void process_options(int argc, char *argv[]) +{ + int error = 0, counter; + + if (strstr(argv[0], "perfstat")) + run_perfstat = 1; + + for (;;) { + int option_index = 0; + /** Options for getopt */ + static struct option long_options[] = { + {"count", required_argument, NULL, 'c'}, + {"cpu", required_argument, NULL, 'C'}, + {"delay", required_argument, NULL, 'd'}, + {"dump_symtab", no_argument, NULL, 'D'}, + {"event", required_argument, NULL, 'e'}, + {"filter", required_argument, NULL, 'f'}, + {"group", required_argument, NULL, 'g'}, + {"help", no_argument, NULL, 'h'}, + {"nmi", required_argument, NULL, 'n'}, + {"pid", required_argument, NULL, 'p'}, + {"vmlinux", required_argument, NULL, 'x'}, + {"symbol", required_argument, NULL, 's'}, + {"stat", no_argument, NULL, 'S'}, + {"zero", no_argument, NULL, 'z'}, + {"mmap_pages", required_argument, NULL, 'm'}, + {NULL, 0, NULL, 0 } + }; + int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hn:m:p:s:Sx:z", + long_options, &option_index); + if (c == -1) + break; + + switch (c) { + case 'a': system_wide = 1; break; + case 'c': default_interval = atoi(optarg); break; + case 'C': + /* CPU and PID are mutually exclusive */ + if (tid != -1) { + printf("WARNING: CPU switch overriding PID\n"); + sleep(1); + tid = -1; + } + profile_cpu = atoi(optarg); break; + case 'd': delay_secs = atoi(optarg); break; + case 'D': dump_symtab = 1; break; + + case 'e': error = parse_events(optarg); break; + + case 'f': count_filter = atoi(optarg); break; + case 'g': group = atoi(optarg); break; + case 'h': display_help(); break; + case 'n': nmi = atoi(optarg); break; + case 'p': + /* CPU and PID are mutually exclusive */ + if (profile_cpu != -1) { + printf("WARNING: PID switch overriding CPU\n"); + sleep(1); + profile_cpu = -1; + } + tid = atoi(optarg); break; + case 's': sym_filter = strdup(optarg); break; + case 'S': run_perfstat = 1; break; + case 'x': vmlinux = strdup(optarg); break; + case 'z': zero = 1; break; + case 'm': mmap_pages = atoi(optarg); break; + default: error = 1; break; + } + } + if (error) + display_help(); + + if (!nr_counters) { + if (run_perfstat) + nr_counters = 8; + else { + nr_counters = 1; + event_id[0] = 0; + } + } + + for (counter = 0; counter < nr_counters; counter++) { + if (event_count[counter]) + continue; + + event_count[counter] = default_interval; + } +} + +struct mmap_data { + int counter; + void *base; + unsigned int mask; + unsigned int prev; +}; + +static unsigned int mmap_read_head(struct mmap_data *md) +{ + struct perf_counter_mmap_page *pc = md->base; + unsigned int seq, head; + +repeat: + rmb(); + seq = pc->lock; + + if (unlikely(seq & 1)) { + cpu_relax(); + goto repeat; + } + + head = pc->data_head; + + rmb(); + if (pc->lock != seq) + goto repeat; + + return head; +} + +struct timeval last_read, this_read; + +static void mmap_read(struct mmap_data *md) +{ + unsigned int head = mmap_read_head(md); + unsigned int old = md->prev; + unsigned char *data = md->base + page_size; + int diff; + + gettimeofday(&this_read, NULL); + + /* + * If we're further behind than half the buffer, there's a chance + * the writer will bite our tail and screw up the events under us. + * + * If we somehow ended up ahead of the head, we got messed up. + * + * In either case, truncate and restart at head. + */ + diff = head - old; + if (diff > md->mask / 2 || diff < 0) { + struct timeval iv; + unsigned long msecs; + + timersub(&this_read, &last_read, &iv); + msecs = iv.tv_sec*1000 + iv.tv_usec/1000; + + fprintf(stderr, "WARNING: failed to keep up with mmap data." + " Last read %lu msecs ago.\n", msecs); + + /* + * head points to a known good entry, start there. + */ + old = head; + } + + last_read = this_read; + + for (; old != head;) { + struct event_struct { + struct perf_event_header header; + __u64 ip; + __u32 pid, tid; + } *event = (struct event_struct *)&data[old & md->mask]; + struct event_struct event_copy; + + unsigned int size = event->header.size; + + /* + * Event straddles the mmap boundary -- header should always + * be inside due to u64 alignment of output. + */ + if ((old & md->mask) + size != ((old + size) & md->mask)) { + unsigned int offset = old; + unsigned int len = sizeof(*event), cpy; + void *dst = &event_copy; + + do { + cpy = min(md->mask + 1 - (offset & md->mask), len); + memcpy(dst, &data[offset & md->mask], cpy); + offset += cpy; + dst += cpy; + len -= cpy; + } while (len); + + event = &event_copy; + } + + old += size; + + switch (event->header.type) { + case PERF_EVENT_IP: + case PERF_EVENT_IP | __PERF_EVENT_TID: + process_event(event->ip, md->counter); + break; + } + } + + md->prev = old; +} + +int main(int argc, char *argv[]) +{ + struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; + struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; + struct perf_counter_hw_event hw_event; + int i, counter, group_fd, nr_poll = 0; + unsigned int cpu; + int ret; + + page_size = sysconf(_SC_PAGE_SIZE); + + process_options(argc, argv); + + nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); + assert(nr_cpus <= MAX_NR_CPUS); + assert(nr_cpus >= 0); + + if (run_perfstat) + return do_perfstat(argc, argv); + + if (tid != -1 || profile_cpu != -1) + nr_cpus = 1; + + parse_symbols(); + if (vmlinux && sym_filter_entry) + parse_vmlinux(vmlinux); + + for (i = 0; i < nr_cpus; i++) { + group_fd = -1; + for (counter = 0; counter < nr_counters; counter++) { + + cpu = profile_cpu; + if (tid == -1 && profile_cpu == -1) + cpu = i; + + memset(&hw_event, 0, sizeof(hw_event)); + hw_event.config = event_id[counter]; + hw_event.irq_period = event_count[counter]; + hw_event.record_type = PERF_RECORD_IRQ; + hw_event.nmi = nmi; + hw_event.include_tid = 1; + + fd[i][counter] = sys_perf_counter_open(&hw_event, tid, cpu, group_fd, 0); + if (fd[i][counter] < 0) { + int err = errno; + printf("kerneltop error: syscall returned with %d (%s)\n", + fd[i][counter], strerror(err)); + if (err == EPERM) + printf("Are you root?\n"); + exit(-1); + } + assert(fd[i][counter] >= 0); + fcntl(fd[i][counter], F_SETFL, O_NONBLOCK); + + /* + * First counter acts as the group leader: + */ + if (group && group_fd == -1) + group_fd = fd[i][counter]; + + event_array[nr_poll].fd = fd[i][counter]; + event_array[nr_poll].events = POLLIN; + nr_poll++; + + mmap_array[i][counter].counter = counter; + mmap_array[i][counter].prev = 0; + mmap_array[i][counter].mask = mmap_pages*page_size - 1; + mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size, + PROT_READ, MAP_SHARED, fd[i][counter], 0); + if (mmap_array[i][counter].base == MAP_FAILED) { + printf("kerneltop error: failed to mmap with %d (%s)\n", + errno, strerror(errno)); + exit(-1); + } + } + } + + printf("KernelTop refresh period: %d seconds\n", delay_secs); + last_refresh = time(NULL); + + while (1) { + int hits = events; + + for (i = 0; i < nr_cpus; i++) { + for (counter = 0; counter < nr_counters; counter++) + mmap_read(&mmap_array[i][counter]); + } + + if (time(NULL) >= last_refresh + delay_secs) { + print_sym_table(); + events = userspace_events = 0; + } + + if (hits == events) + ret = poll(event_array, nr_poll, 1000); + hits = events; + } + + return 0; +} Index: linux-2.6-tip/Documentation/scheduler/00-INDEX =================================================================== --- linux-2.6-tip.orig/Documentation/scheduler/00-INDEX +++ linux-2.6-tip/Documentation/scheduler/00-INDEX @@ -2,8 +2,6 @@ - this file. sched-arch.txt - CPU Scheduler implementation hints for architecture specific code. -sched-coding.txt - - reference for various scheduler-related methods in the O(1) scheduler. sched-design-CFS.txt - goals, design and implementation of the Complete Fair Scheduler. sched-domains.txt Index: linux-2.6-tip/Documentation/scheduler/sched-coding.txt =================================================================== --- linux-2.6-tip.orig/Documentation/scheduler/sched-coding.txt +++ /dev/null @@ -1,126 +0,0 @@ - Reference for various scheduler-related methods in the O(1) scheduler - Robert Love , MontaVista Software - - -Note most of these methods are local to kernel/sched.c - this is by design. -The scheduler is meant to be self-contained and abstracted away. This document -is primarily for understanding the scheduler, not interfacing to it. Some of -the discussed interfaces, however, are general process/scheduling methods. -They are typically defined in include/linux/sched.h. - - -Main Scheduling Methods ------------------------ - -void load_balance(runqueue_t *this_rq, int idle) - Attempts to pull tasks from one cpu to another to balance cpu usage, - if needed. This method is called explicitly if the runqueues are - imbalanced or periodically by the timer tick. Prior to calling, - the current runqueue must be locked and interrupts disabled. - -void schedule() - The main scheduling function. Upon return, the highest priority - process will be active. - - -Locking -------- - -Each runqueue has its own lock, rq->lock. When multiple runqueues need -to be locked, lock acquires must be ordered by ascending &runqueue value. - -A specific runqueue is locked via - - task_rq_lock(task_t pid, unsigned long *flags) - -which disables preemption, disables interrupts, and locks the runqueue pid is -running on. Likewise, - - task_rq_unlock(task_t pid, unsigned long *flags) - -unlocks the runqueue pid is running on, restores interrupts to their previous -state, and reenables preemption. - -The routines - - double_rq_lock(runqueue_t *rq1, runqueue_t *rq2) - -and - - double_rq_unlock(runqueue_t *rq1, runqueue_t *rq2) - -safely lock and unlock, respectively, the two specified runqueues. They do -not, however, disable and restore interrupts. Users are required to do so -manually before and after calls. - - -Values ------- - -MAX_PRIO - The maximum priority of the system, stored in the task as task->prio. - Lower priorities are higher. Normal (non-RT) priorities range from - MAX_RT_PRIO to (MAX_PRIO - 1). -MAX_RT_PRIO - The maximum real-time priority of the system. Valid RT priorities - range from 0 to (MAX_RT_PRIO - 1). -MAX_USER_RT_PRIO - The maximum real-time priority that is exported to user-space. Should - always be equal to or less than MAX_RT_PRIO. Setting it less allows - kernel threads to have higher priorities than any user-space task. -MIN_TIMESLICE -MAX_TIMESLICE - Respectively, the minimum and maximum timeslices (quanta) of a process. - -Data ----- - -struct runqueue - The main per-CPU runqueue data structure. -struct task_struct - The main per-process data structure. - - -General Methods ---------------- - -cpu_rq(cpu) - Returns the runqueue of the specified cpu. -this_rq() - Returns the runqueue of the current cpu. -task_rq(pid) - Returns the runqueue which holds the specified pid. -cpu_curr(cpu) - Returns the task currently running on the given cpu. -rt_task(pid) - Returns true if pid is real-time, false if not. - - -Process Control Methods ------------------------ - -void set_user_nice(task_t *p, long nice) - Sets the "nice" value of task p to the given value. -int setscheduler(pid_t pid, int policy, struct sched_param *param) - Sets the scheduling policy and parameters for the given pid. -int set_cpus_allowed(task_t *p, unsigned long new_mask) - Sets a given task's CPU affinity and migrates it to a proper cpu. - Callers must have a valid reference to the task and assure the - task not exit prematurely. No locks can be held during the call. -set_task_state(tsk, state_value) - Sets the given task's state to the given value. -set_current_state(state_value) - Sets the current task's state to the given value. -void set_tsk_need_resched(struct task_struct *tsk) - Sets need_resched in the given task. -void clear_tsk_need_resched(struct task_struct *tsk) - Clears need_resched in the given task. -void set_need_resched() - Sets need_resched in the current task. -void clear_need_resched() - Clears need_resched in the current task. -int need_resched() - Returns true if need_resched is set in the current task, false - otherwise. -yield() - Place the current process at the end of the runqueue and call schedule. Index: linux-2.6-tip/Documentation/sysrq.txt =================================================================== --- linux-2.6-tip.orig/Documentation/sysrq.txt +++ linux-2.6-tip/Documentation/sysrq.txt @@ -113,6 +113,8 @@ On all - write a character to /proc/sys 'x' - Used by xmon interface on ppc/powerpc platforms. +'z' - Dump the ftrace buffer + '0'-'9' - Sets the console log level, controlling which kernel messages will be printed to your console. ('0', for example would make it so that only emergency messages like PANICs or OOPSes would Index: linux-2.6-tip/Documentation/tracepoints.txt =================================================================== --- linux-2.6-tip.orig/Documentation/tracepoints.txt +++ linux-2.6-tip/Documentation/tracepoints.txt @@ -45,8 +45,8 @@ In include/trace/subsys.h : #include DECLARE_TRACE(subsys_eventname, - TPPROTO(int firstarg, struct task_struct *p), - TPARGS(firstarg, p)); + TP_PROTO(int firstarg, struct task_struct *p), + TP_ARGS(firstarg, p)); In subsys/file.c (where the tracing statement must be added) : @@ -66,10 +66,10 @@ Where : - subsys is the name of your subsystem. - eventname is the name of the event to trace. -- TPPROTO(int firstarg, struct task_struct *p) is the prototype of the +- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the function called by this tracepoint. -- TPARGS(firstarg, p) are the parameters names, same as found in the +- TP_ARGS(firstarg, p) are the parameters names, same as found in the prototype. Connecting a function (probe) to a tracepoint is done by providing a @@ -103,13 +103,14 @@ used to export the defined tracepoints. * Probe / tracepoint example -See the example provided in samples/tracepoints/src +See the example provided in samples/tracepoints -Compile them with your kernel. +Compile them with your kernel. They are built during 'make' (not +'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m. Run, as root : -modprobe tracepoint-example (insmod order is not important) -modprobe tracepoint-probe-example -cat /proc/tracepoint-example (returns an expected error) -rmmod tracepoint-example tracepoint-probe-example +modprobe tracepoint-sample (insmod order is not important) +modprobe tracepoint-probe-sample +cat /proc/tracepoint-sample (returns an expected error) +rmmod tracepoint-sample tracepoint-probe-sample dmesg Index: linux-2.6-tip/Documentation/vm/kmemtrace.txt =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/vm/kmemtrace.txt @@ -0,0 +1,126 @@ + kmemtrace - Kernel Memory Tracer + + by Eduard - Gabriel Munteanu + + +I. Introduction +=============== + +kmemtrace helps kernel developers figure out two things: +1) how different allocators (SLAB, SLUB etc.) perform +2) how kernel code allocates memory and how much + +To do this, we trace every allocation and export information to the userspace +through the relay interface. We export things such as the number of requested +bytes, the number of bytes actually allocated (i.e. including internal +fragmentation), whether this is a slab allocation or a plain kmalloc() and so +on. + +The actual analysis is performed by a userspace tool (see section III for +details on where to get it from). It logs the data exported by the kernel, +processes it and (as of writing this) can provide the following information: +- the total amount of memory allocated and fragmentation per call-site +- the amount of memory allocated and fragmentation per allocation +- total memory allocated and fragmentation in the collected dataset +- number of cross-CPU allocation and frees (makes sense in NUMA environments) + +Moreover, it can potentially find inconsistent and erroneous behavior in +kernel code, such as using slab free functions on kmalloc'ed memory or +allocating less memory than requested (but not truly failed allocations). + +kmemtrace also makes provisions for tracing on some arch and analysing the +data on another. + +II. Design and goals +==================== + +kmemtrace was designed to handle rather large amounts of data. Thus, it uses +the relay interface to export whatever is logged to userspace, which then +stores it. Analysis and reporting is done asynchronously, that is, after the +data is collected and stored. By design, it allows one to log and analyse +on different machines and different arches. + +As of writing this, the ABI is not considered stable, though it might not +change much. However, no guarantees are made about compatibility yet. When +deemed stable, the ABI should still allow easy extension while maintaining +backward compatibility. This is described further in Documentation/ABI. + +Summary of design goals: + - allow logging and analysis to be done across different machines + - be fast and anticipate usage in high-load environments (*) + - be reasonably extensible + - make it possible for GNU/Linux distributions to have kmemtrace + included in their repositories + +(*) - one of the reasons Pekka Enberg's original userspace data analysis + tool's code was rewritten from Perl to C (although this is more than a + simple conversion) + + +III. Quick usage guide +====================== + +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable +CONFIG_KMEMTRACE). + +2) Get the userspace tool and build it: +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository +$ cd kmemtrace-user/ +$ ./autogen.sh +$ ./configure +$ make + +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the +'single' runlevel (so that relay buffers don't fill up easily), and run +kmemtrace: +# '$' does not mean user, but root here. +$ mount -t debugfs none /sys/kernel/debug +$ mount -t proc none /proc +$ cd path/to/kmemtrace-user/ +$ ./kmemtraced +Wait a bit, then stop it with CTRL+C. +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't + # overrun, should + # be zero. +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to + check its correctness] +$ ./kmemtrace-report + +Now you should have a nice and short summary of how the allocator performs. + +IV. FAQ and known issues +======================== + +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix +this? Should I worry? +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how +large the number is. You can fix it by supplying a higher +'kmemtrace.subbufs=N' kernel parameter. +--- + +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? +A: This is a bug and should be reported. It can occur for a variety of +reasons: + - possible bugs in relay code + - possible misuse of relay by kmemtrace + - timestamps being collected unorderly +Or you may fix it yourself and send us a patch. +--- + +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? +A: This is a known issue and I'm working on it. These might be true errors +in kernel code, which may have inconsistent behavior (e.g. allocating memory +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed +out this behavior may work with SLAB, but may fail with other allocators. + +It may also be due to lack of tracing in some unusual allocator functions. + +We don't want bug reports regarding this issue yet. +--- + +V. See also +=========== + +Documentation/kernel-parameters.txt +Documentation/ABI/testing/debugfs-kmemtrace + Index: linux-2.6-tip/Documentation/x86/boot.txt =================================================================== --- linux-2.6-tip.orig/Documentation/x86/boot.txt +++ linux-2.6-tip/Documentation/x86/boot.txt @@ -158,7 +158,7 @@ Offset Proto Name Meaning 0202/4 2.00+ header Magic signature "HdrS" 0206/2 2.00+ version Boot protocol version supported 0208/4 2.00+ realmode_swtch Boot loader hook (see below) -020C/2 2.00+ start_sys The load-low segment (0x1000) (obsolete) +020C/2 2.00+ start_sys_seg The load-low segment (0x1000) (obsolete) 020E/2 2.00+ kernel_version Pointer to kernel version string 0210/1 2.00+ type_of_loader Boot loader identifier 0211/1 2.00+ loadflags Boot protocol option flags @@ -170,10 +170,11 @@ Offset Proto Name Meaning 0224/2 2.01+ heap_end_ptr Free memory after setup end 0226/2 N/A pad1 Unused 0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line -022C/4 2.03+ initrd_addr_max Highest legal initrd address +022C/4 2.03+ ramdisk_max Highest legal initrd address 0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not -0235/3 N/A pad2 Unused +0235/1 N/A pad2 Unused +0236/2 N/A pad3 Unused 0238/4 2.06+ cmdline_size Maximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data @@ -299,14 +300,14 @@ Protocol: 2.00+ e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version 10.17. -Field name: readmode_swtch +Field name: realmode_swtch Type: modify (optional) Offset/size: 0x208/4 Protocol: 2.00+ Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) -Field name: start_sys +Field name: start_sys_seg Type: read Offset/size: 0x20c/2 Protocol: 2.00+ @@ -468,7 +469,7 @@ Protocol: 2.02+ zero, the kernel will assume that your boot loader does not support the 2.02+ protocol. -Field name: initrd_addr_max +Field name: ramdisk_max Type: read Offset/size: 0x22c/4 Protocol: 2.03+ @@ -542,7 +543,10 @@ Protocol: 2.08+ The payload may be compressed. The format of both the compressed and uncompressed data should be determined using the standard magic - numbers. Currently only gzip compressed ELF is used. + numbers. The currently supported compression formats are gzip + (magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A) and LZMA + (magic number 5D 00). The uncompressed payload is currently always ELF + (magic number 7F 45 4C 46). Field name: payload_length Type: read Index: linux-2.6-tip/Documentation/x86/earlyprintk.txt =================================================================== --- /dev/null +++ linux-2.6-tip/Documentation/x86/earlyprintk.txt @@ -0,0 +1,101 @@ + +Mini-HOWTO for using the earlyprintk=dbgp boot option with a +USB2 Debug port key and a debug cable, on x86 systems. + +You need two computers, the 'USB debug key' special gadget and +and two USB cables, connected like this: + + [host/target] <-------> [USB debug key] <-------> [client/console] + +1. There are three specific hardware requirements: + + a.) Host/target system needs to have USB debug port capability. + + You can check this capability by looking at a 'Debug port' bit in + the lspci -vvv output: + + # lspci -vvv + ... + 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) (prog-if 20 [EHCI]) + Subsystem: Lenovo ThinkPad T61 + Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- + Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- /proc/sysrq-trigger + + On the host/target system you should see this help line in "dmesg" output: + + SysRq : HELP : loglevel(0-9) reBoot Crashdump terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount show-blocked-tasks(W) dump-ftrace-buffer(Z) + + On the client/console system do: + + cat /dev/ttyUSB0 + + And you should see the help line above displayed shortly after you've + provoked it on the host system. + +If it does not work then please ask about it on the linux-kernel@vger.kernel.org +mailing list or contact the x86 maintainers. Index: linux-2.6-tip/MAINTAINERS =================================================================== --- linux-2.6-tip.orig/MAINTAINERS +++ linux-2.6-tip/MAINTAINERS @@ -1952,6 +1952,15 @@ L: linux-media@vger.kernel.org T: git kernel.org:/pub/scm/linux/kernel/git/mchehab/linux-2.6.git S: Maintained +HARDWARE LATENCY DETECTOR +P: Jon Masters +M: jcm@jonmasters.org +W: http://www.kernel.org/pub/linux/kernel/people/jcm/hwlat_detector/ +S: Supported +L: linux-kernel@vger.kernel.org +F: Documentation/hwlat_detector.txt +F: drivers/misc/hwlat_detector.c + HARDWARE MONITORING L: lm-sensors@lm-sensors.org W: http://www.lm-sensors.org/ @@ -2621,6 +2630,20 @@ M: jason.wessel@windriver.com L: kgdb-bugreport@lists.sourceforge.net S: Maintained +KMEMCHECK +P: Vegard Nossum +M: vegardno@ifi.uio.no +P Pekka Enberg +M: penberg@cs.helsinki.fi +L: linux-kernel@vger.kernel.org +S: Maintained + +KMEMTRACE +P: Eduard - Gabriel Munteanu +M: eduard.munteanu@linux360.ro +L: linux-kernel@vger.kernel.org +S: Maintained + KPROBES P: Ananth N Mavinakayanahalli M: ananth@in.ibm.com Index: linux-2.6-tip/Makefile =================================================================== --- linux-2.6-tip.orig/Makefile +++ linux-2.6-tip/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 29 -EXTRAVERSION = .6 +EXTRAVERSION = .6-rt24 NAME = Temporary Tasmanian Devil # *DOCUMENTATION* @@ -533,8 +533,9 @@ KBUILD_CFLAGS += $(call cc-option,-Wfram endif # Force gcc to behave correct even for buggy distributions -# Arch Makefiles may override this setting +ifndef CONFIG_CC_STACKPROTECTOR KBUILD_CFLAGS += $(call cc-option, -fno-stack-protector) +endif ifdef CONFIG_FRAME_POINTER KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls @@ -551,6 +552,10 @@ ifdef CONFIG_FUNCTION_TRACER KBUILD_CFLAGS += -pg endif +ifndef CONFIG_ALLOW_WARNINGS +KBUILD_CFLAGS += -Werror ${WERROR} +endif + # We trigger additional mismatches with less inlining ifdef CONFIG_DEBUG_SECTION_MISMATCH KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once) Index: linux-2.6-tip/arch/Kconfig =================================================================== --- linux-2.6-tip.orig/arch/Kconfig +++ linux-2.6-tip/arch/Kconfig @@ -6,6 +6,7 @@ config OPROFILE tristate "OProfile system profiling (EXPERIMENTAL)" depends on PROFILING depends on HAVE_OPROFILE + depends on TRACING_SUPPORT select TRACING select RING_BUFFER help @@ -32,6 +33,11 @@ config OPROFILE_IBS config HAVE_OPROFILE bool +config PROFILE_NMI + bool + depends on OPROFILE + default y + config KPROBES bool "Kprobes" depends on KALLSYMS && MODULES @@ -106,3 +112,5 @@ config HAVE_CLK The calls support software clock gating and thus are a key power management tool on many systems. +config HAVE_DMA_API_DEBUG + bool Index: linux-2.6-tip/arch/alpha/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/alpha/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/alpha/include/asm/hardirq.h =================================================================== --- linux-2.6-tip.orig/arch/alpha/include/asm/hardirq.h +++ linux-2.6-tip/arch/alpha/include/asm/hardirq.h @@ -14,17 +14,4 @@ typedef struct { void ack_bad_irq(unsigned int irq); -#define HARDIRQ_BITS 12 - -/* - * The hardirq mask has to be large enough to have - * space for potentially nestable IRQ sources in the system - * to nest on a single CPU. On Alpha, interrupts are masked at the CPU - * by IPL as well as at the system level. We only have 8 IPLs (UNIX PALcode) - * so we really only have 8 nestable IRQs, but allow some overhead - */ -#if (1 << HARDIRQ_BITS) < 16 -#error HARDIRQ_BITS is too low! -#endif - #endif /* _ALPHA_HARDIRQ_H */ Index: linux-2.6-tip/arch/alpha/include/asm/statfs.h =================================================================== --- linux-2.6-tip.orig/arch/alpha/include/asm/statfs.h +++ linux-2.6-tip/arch/alpha/include/asm/statfs.h @@ -1,6 +1,8 @@ #ifndef _ALPHA_STATFS_H #define _ALPHA_STATFS_H +#include + /* Alpha is the only 64-bit platform with 32-bit statfs. And doesn't even seem to implement statfs64 */ #define __statfs_word __u32 Index: linux-2.6-tip/arch/alpha/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/alpha/include/asm/swab.h +++ linux-2.6-tip/arch/alpha/include/asm/swab.h @@ -1,7 +1,7 @@ #ifndef _ALPHA_SWAB_H #define _ALPHA_SWAB_H -#include +#include #include #include Index: linux-2.6-tip/arch/alpha/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/alpha/kernel/irq.c +++ linux-2.6-tip/arch/alpha/kernel/irq.c @@ -55,7 +55,7 @@ int irq_select_affinity(unsigned int irq cpu = (cpu < (NR_CPUS-1) ? cpu + 1 : 0); last_cpu = cpu; - irq_desc[irq].affinity = cpumask_of_cpu(cpu); + cpumask_copy(irq_desc[irq].affinity, cpumask_of(cpu)); irq_desc[irq].chip->set_affinity(irq, cpumask_of(cpu)); return 0; } @@ -90,7 +90,7 @@ show_interrupts(struct seq_file *p, void seq_printf(p, "%10u ", kstat_irqs(irq)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[irq]); + seq_printf(p, "%10u ", kstat_irqs_cpu(irq, j)); #endif seq_printf(p, " %14s", irq_desc[irq].chip->typename); seq_printf(p, " %c%s", Index: linux-2.6-tip/arch/alpha/kernel/irq_alpha.c =================================================================== --- linux-2.6-tip.orig/arch/alpha/kernel/irq_alpha.c +++ linux-2.6-tip/arch/alpha/kernel/irq_alpha.c @@ -64,7 +64,7 @@ do_entInt(unsigned long type, unsigned l smp_percpu_timer_interrupt(regs); cpu = smp_processor_id(); if (cpu != boot_cpuid) { - kstat_cpu(cpu).irqs[RTC_IRQ]++; + kstat_incr_irqs_this_cpu(RTC_IRQ, irq_to_desc(RTC_IRQ)); } else { handle_irq(RTC_IRQ); } Index: linux-2.6-tip/arch/alpha/mm/init.c =================================================================== --- linux-2.6-tip.orig/arch/alpha/mm/init.c +++ linux-2.6-tip/arch/alpha/mm/init.c @@ -189,9 +189,21 @@ callback_init(void * kernel_end) if (alpha_using_srm) { static struct vm_struct console_remap_vm; - unsigned long vaddr = VMALLOC_START; + unsigned long nr_pages = 0; + unsigned long vaddr; unsigned long i, j; + /* calculate needed size */ + for (i = 0; i < crb->map_entries; ++i) + nr_pages += crb->map[i].count; + + /* register the vm area */ + console_remap_vm.flags = VM_ALLOC; + console_remap_vm.size = nr_pages << PAGE_SHIFT; + vm_area_register_early(&console_remap_vm, PAGE_SIZE); + + vaddr = (unsigned long)console_remap_vm.addr; + /* Set up the third level PTEs and update the virtual addresses of the CRB entries. */ for (i = 0; i < crb->map_entries; ++i) { @@ -213,12 +225,6 @@ callback_init(void * kernel_end) vaddr += PAGE_SIZE; } } - - /* Let vmalloc know that we've allocated some space. */ - console_remap_vm.flags = VM_ALLOC; - console_remap_vm.addr = (void *) VMALLOC_START; - console_remap_vm.size = vaddr - VMALLOC_START; - vmlist = &console_remap_vm; } callback_init_done = 1; Index: linux-2.6-tip/arch/arm/include/asm/a.out.h =================================================================== --- linux-2.6-tip.orig/arch/arm/include/asm/a.out.h +++ linux-2.6-tip/arch/arm/include/asm/a.out.h @@ -2,7 +2,7 @@ #define __ARM_A_OUT_H__ #include -#include +#include struct exec { Index: linux-2.6-tip/arch/arm/include/asm/setup.h =================================================================== --- linux-2.6-tip.orig/arch/arm/include/asm/setup.h +++ linux-2.6-tip/arch/arm/include/asm/setup.h @@ -14,7 +14,7 @@ #ifndef __ASMARM_SETUP_H #define __ASMARM_SETUP_H -#include +#include #define COMMAND_LINE_SIZE 1024 Index: linux-2.6-tip/arch/arm/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/arm/include/asm/swab.h +++ linux-2.6-tip/arch/arm/include/asm/swab.h @@ -16,7 +16,7 @@ #define __ASM_ARM_SWAB_H #include -#include +#include #if !defined(__STRICT_ANSI__) || defined(__KERNEL__) # define __SWAB_64_THRU_32__ Index: linux-2.6-tip/arch/arm/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/arm/kernel/irq.c +++ linux-2.6-tip/arch/arm/kernel/irq.c @@ -76,7 +76,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%3d: ", i); for_each_present_cpu(cpu) - seq_printf(p, "%10u ", kstat_cpu(cpu).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, cpu)); seq_printf(p, " %10s", irq_desc[i].chip->name ? : "-"); seq_printf(p, " %s", action->name); for (action = action->next; action; action = action->next) @@ -101,9 +101,14 @@ unlock: /* Handle bad interrupts */ static struct irq_desc bad_irq_desc = { .handle_irq = handle_bad_irq, - .lock = __SPIN_LOCK_UNLOCKED(bad_irq_desc.lock), + .lock = RAW_SPIN_LOCK_UNLOCKED(bad_irq_desc.lock), }; +#ifdef CONFIG_CPUMASK_OFFSTACK +/* We are not allocating bad_irq_desc.affinity or .pending_mask */ +#error "ARM architecture does not support CONFIG_CPUMASK_OFFSTACK." +#endif + /* * do_IRQ handles all hardware IRQ's. Decoded IRQs should not * come via this function. Instead, they should provide their @@ -161,7 +166,7 @@ void __init init_IRQ(void) irq_desc[irq].status |= IRQ_NOREQUEST | IRQ_NOPROBE; #ifdef CONFIG_SMP - bad_irq_desc.affinity = CPU_MASK_ALL; + cpumask_setall(bad_irq_desc.affinity); bad_irq_desc.cpu = smp_processor_id(); #endif init_arch_irq(); @@ -191,15 +196,16 @@ void migrate_irqs(void) struct irq_desc *desc = irq_desc + i; if (desc->cpu == cpu) { - unsigned int newcpu = any_online_cpu(desc->affinity); - - if (newcpu == NR_CPUS) { + unsigned int newcpu = cpumask_any_and(desc->affinity, + cpu_online_mask); + if (newcpu >= nr_cpu_ids) { if (printk_ratelimit()) printk(KERN_INFO "IRQ%u no longer affine to CPU%u\n", i, cpu); - cpus_setall(desc->affinity); - newcpu = any_online_cpu(desc->affinity); + cpumask_setall(desc->affinity); + newcpu = cpumask_any_and(desc->affinity, + cpu_online_mask); } route_irq(desc, i, newcpu); Index: linux-2.6-tip/arch/arm/kernel/vmlinux.lds.S =================================================================== --- linux-2.6-tip.orig/arch/arm/kernel/vmlinux.lds.S +++ linux-2.6-tip/arch/arm/kernel/vmlinux.lds.S @@ -64,7 +64,9 @@ SECTIONS __initramfs_end = .; #endif . = ALIGN(4096); + __per_cpu_load = .; __per_cpu_start = .; + *(.data.percpu.page_aligned) *(.data.percpu) *(.data.percpu.shared_aligned) __per_cpu_end = .; Index: linux-2.6-tip/arch/arm/mach-ns9xxx/irq.c =================================================================== --- linux-2.6-tip.orig/arch/arm/mach-ns9xxx/irq.c +++ linux-2.6-tip/arch/arm/mach-ns9xxx/irq.c @@ -63,7 +63,6 @@ static struct irq_chip ns9xxx_chip = { #else static void handle_prio_irq(unsigned int irq, struct irq_desc *desc) { - unsigned int cpu = smp_processor_id(); struct irqaction *action; irqreturn_t action_ret; @@ -72,7 +71,7 @@ static void handle_prio_irq(unsigned int BUG_ON(desc->status & IRQ_INPROGRESS); desc->status &= ~(IRQ_REPLAY | IRQ_WAITING); - kstat_cpu(cpu).irqs[irq]++; + kstat_incr_irqs_this_cpu(irq, desc); action = desc->action; if (unlikely(!action || (desc->status & IRQ_DISABLED))) Index: linux-2.6-tip/arch/arm/oprofile/op_model_mpcore.c =================================================================== --- linux-2.6-tip.orig/arch/arm/oprofile/op_model_mpcore.c +++ linux-2.6-tip/arch/arm/oprofile/op_model_mpcore.c @@ -263,7 +263,7 @@ static void em_route_irq(int irq, unsign const struct cpumask *mask = cpumask_of(cpu); spin_lock_irq(&desc->lock); - desc->affinity = *mask; + cpumask_copy(desc->affinity, mask); desc->chip->set_affinity(irq, mask); spin_unlock_irq(&desc->lock); } Index: linux-2.6-tip/arch/avr32/Kconfig =================================================================== --- linux-2.6-tip.orig/arch/avr32/Kconfig +++ linux-2.6-tip/arch/avr32/Kconfig @@ -181,7 +181,7 @@ source "kernel/Kconfig.preempt" config QUICKLIST def_bool y -config HAVE_ARCH_BOOTMEM_NODE +config HAVE_ARCH_BOOTMEM def_bool n config ARCH_HAVE_MEMORY_PRESENT Index: linux-2.6-tip/arch/avr32/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/avr32/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/avr32/include/asm/hardirq.h =================================================================== --- linux-2.6-tip.orig/arch/avr32/include/asm/hardirq.h +++ linux-2.6-tip/arch/avr32/include/asm/hardirq.h @@ -20,15 +20,4 @@ void ack_bad_irq(unsigned int irq); #endif /* __ASSEMBLY__ */ -#define HARDIRQ_BITS 12 - -/* - * The hardirq mask has to be large enough to have - * space for potentially all IRQ sources in the system - * nesting on a single CPU: - */ -#if (1 << HARDIRQ_BITS) < NR_IRQS -# error HARDIRQ_BITS is too low! -#endif - #endif /* __ASM_AVR32_HARDIRQ_H */ Index: linux-2.6-tip/arch/avr32/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/avr32/include/asm/swab.h +++ linux-2.6-tip/arch/avr32/include/asm/swab.h @@ -4,7 +4,7 @@ #ifndef __ASM_AVR32_SWAB_H #define __ASM_AVR32_SWAB_H -#include +#include #include #define __SWAB_64_THRU_32__ Index: linux-2.6-tip/arch/avr32/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/avr32/kernel/irq.c +++ linux-2.6-tip/arch/avr32/kernel/irq.c @@ -58,7 +58,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%3d: ", i); for_each_online_cpu(cpu) - seq_printf(p, "%10u ", kstat_cpu(cpu).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, cpu)); seq_printf(p, " %8s", irq_desc[i].chip->name ? : "-"); seq_printf(p, " %s", action->name); for (action = action->next; action; action = action->next) Index: linux-2.6-tip/arch/blackfin/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/blackfin/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/blackfin/include/asm/percpu.h =================================================================== --- linux-2.6-tip.orig/arch/blackfin/include/asm/percpu.h +++ linux-2.6-tip/arch/blackfin/include/asm/percpu.h @@ -3,14 +3,4 @@ #include -#ifdef CONFIG_MODULES -#define PERCPU_MODULE_RESERVE 8192 -#else -#define PERCPU_MODULE_RESERVE 0 -#endif - -#define PERCPU_ENOUGH_ROOM \ - (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \ - PERCPU_MODULE_RESERVE) - #endif /* __ARCH_BLACKFIN_PERCPU__ */ Index: linux-2.6-tip/arch/blackfin/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/blackfin/include/asm/swab.h +++ linux-2.6-tip/arch/blackfin/include/asm/swab.h @@ -1,7 +1,7 @@ #ifndef _BLACKFIN_SWAB_H #define _BLACKFIN_SWAB_H -#include +#include #include #if defined(__GNUC__) && !defined(__STRICT_ANSI__) || defined(__KERNEL__) Index: linux-2.6-tip/arch/blackfin/kernel/irqchip.c =================================================================== --- linux-2.6-tip.orig/arch/blackfin/kernel/irqchip.c +++ linux-2.6-tip/arch/blackfin/kernel/irqchip.c @@ -70,6 +70,11 @@ static struct irq_desc bad_irq_desc = { #endif }; +#ifdef CONFIG_CPUMASK_OFFSTACK +/* We are not allocating a variable-sized bad_irq_desc.affinity */ +#error "Blackfin architecture does not support CONFIG_CPUMASK_OFFSTACK." +#endif + int show_interrupts(struct seq_file *p, void *v) { int i = *(loff_t *) v, j; @@ -83,7 +88,7 @@ int show_interrupts(struct seq_file *p, goto skip; seq_printf(p, "%3d: ", i); for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); seq_printf(p, " %8s", irq_desc[i].chip->name); seq_printf(p, " %s", action->name); for (action = action->next; action; action = action->next) Index: linux-2.6-tip/arch/cris/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/cris/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/cris/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/cris/kernel/irq.c +++ linux-2.6-tip/arch/cris/kernel/irq.c @@ -66,7 +66,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #endif seq_printf(p, " %14s", irq_desc[i].chip->typename); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/frv/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/frv/kernel/irq.c +++ linux-2.6-tip/arch/frv/kernel/irq.c @@ -74,7 +74,7 @@ int show_interrupts(struct seq_file *p, if (action) { seq_printf(p, "%3d: ", i); for_each_present_cpu(cpu) - seq_printf(p, "%10u ", kstat_cpu(cpu).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, cpu)); seq_printf(p, " %10s", irq_desc[i].chip->name ? : "-"); seq_printf(p, " %s", action->name); for (action = action->next; Index: linux-2.6-tip/arch/h8300/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/h8300/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/h8300/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/h8300/include/asm/swab.h +++ linux-2.6-tip/arch/h8300/include/asm/swab.h @@ -1,7 +1,7 @@ #ifndef _H8300_SWAB_H #define _H8300_SWAB_H -#include +#include #if defined(__GNUC__) && !defined(__STRICT_ANSI__) || defined(__KERNEL__) # define __SWAB_64_THRU_32__ Index: linux-2.6-tip/arch/h8300/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/h8300/kernel/irq.c +++ linux-2.6-tip/arch/h8300/kernel/irq.c @@ -183,7 +183,7 @@ asmlinkage void do_IRQ(int irq) #if defined(CONFIG_PROC_FS) int show_interrupts(struct seq_file *p, void *v) { - int i = *(loff_t *) v, j; + int i = *(loff_t *) v; struct irqaction * action; unsigned long flags; @@ -196,7 +196,7 @@ int show_interrupts(struct seq_file *p, if (!action) goto unlock; seq_printf(p, "%3d: ",i); - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs(i)); seq_printf(p, " %14s", irq_desc[i].chip->name); seq_printf(p, "-%-8s", irq_desc[i].name); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/ia64/Kconfig =================================================================== --- linux-2.6-tip.orig/arch/ia64/Kconfig +++ linux-2.6-tip/arch/ia64/Kconfig @@ -22,6 +22,9 @@ config IA64 select HAVE_OPROFILE select HAVE_KPROBES select HAVE_KRETPROBES + select HAVE_FTRACE_MCOUNT_RECORD + select HAVE_DYNAMIC_FTRACE if (!ITANIUM) + select HAVE_FUNCTION_TRACER select HAVE_DMA_ATTRS select HAVE_KVM select HAVE_ARCH_TRACEHOOK Index: linux-2.6-tip/arch/ia64/dig/Makefile =================================================================== --- linux-2.6-tip.orig/arch/ia64/dig/Makefile +++ linux-2.6-tip/arch/ia64/dig/Makefile @@ -7,8 +7,8 @@ obj-y := setup.o ifeq ($(CONFIG_DMAR), y) -obj-$(CONFIG_IA64_GENERIC) += machvec.o machvec_vtd.o dig_vtd_iommu.o +obj-$(CONFIG_IA64_GENERIC) += machvec.o machvec_vtd.o else obj-$(CONFIG_IA64_GENERIC) += machvec.o endif -obj-$(CONFIG_IA64_DIG_VTD) += dig_vtd_iommu.o + Index: linux-2.6-tip/arch/ia64/dig/dig_vtd_iommu.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/dig/dig_vtd_iommu.c +++ /dev/null @@ -1,59 +0,0 @@ -#include -#include -#include -#include - -void * -vtd_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t flags) -{ - return intel_alloc_coherent(dev, size, dma_handle, flags); -} -EXPORT_SYMBOL_GPL(vtd_alloc_coherent); - -void -vtd_free_coherent(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_handle) -{ - intel_free_coherent(dev, size, vaddr, dma_handle); -} -EXPORT_SYMBOL_GPL(vtd_free_coherent); - -dma_addr_t -vtd_map_single_attrs(struct device *dev, void *addr, size_t size, - int dir, struct dma_attrs *attrs) -{ - return intel_map_single(dev, (phys_addr_t)addr, size, dir); -} -EXPORT_SYMBOL_GPL(vtd_map_single_attrs); - -void -vtd_unmap_single_attrs(struct device *dev, dma_addr_t iova, size_t size, - int dir, struct dma_attrs *attrs) -{ - intel_unmap_single(dev, iova, size, dir); -} -EXPORT_SYMBOL_GPL(vtd_unmap_single_attrs); - -int -vtd_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, - int dir, struct dma_attrs *attrs) -{ - return intel_map_sg(dev, sglist, nents, dir); -} -EXPORT_SYMBOL_GPL(vtd_map_sg_attrs); - -void -vtd_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, - int nents, int dir, struct dma_attrs *attrs) -{ - intel_unmap_sg(dev, sglist, nents, dir); -} -EXPORT_SYMBOL_GPL(vtd_unmap_sg_attrs); - -int -vtd_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return 0; -} -EXPORT_SYMBOL_GPL(vtd_dma_mapping_error); Index: linux-2.6-tip/arch/ia64/hp/common/hwsw_iommu.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/hp/common/hwsw_iommu.c +++ linux-2.6-tip/arch/ia64/hp/common/hwsw_iommu.c @@ -13,48 +13,33 @@ */ #include +#include #include - #include +extern struct dma_map_ops sba_dma_ops, swiotlb_dma_ops; + /* swiotlb declarations & definitions: */ extern int swiotlb_late_init_with_default_size (size_t size); -/* hwiommu declarations & definitions: */ - -extern ia64_mv_dma_alloc_coherent sba_alloc_coherent; -extern ia64_mv_dma_free_coherent sba_free_coherent; -extern ia64_mv_dma_map_single_attrs sba_map_single_attrs; -extern ia64_mv_dma_unmap_single_attrs sba_unmap_single_attrs; -extern ia64_mv_dma_map_sg_attrs sba_map_sg_attrs; -extern ia64_mv_dma_unmap_sg_attrs sba_unmap_sg_attrs; -extern ia64_mv_dma_supported sba_dma_supported; -extern ia64_mv_dma_mapping_error sba_dma_mapping_error; - -#define hwiommu_alloc_coherent sba_alloc_coherent -#define hwiommu_free_coherent sba_free_coherent -#define hwiommu_map_single_attrs sba_map_single_attrs -#define hwiommu_unmap_single_attrs sba_unmap_single_attrs -#define hwiommu_map_sg_attrs sba_map_sg_attrs -#define hwiommu_unmap_sg_attrs sba_unmap_sg_attrs -#define hwiommu_dma_supported sba_dma_supported -#define hwiommu_dma_mapping_error sba_dma_mapping_error -#define hwiommu_sync_single_for_cpu machvec_dma_sync_single -#define hwiommu_sync_sg_for_cpu machvec_dma_sync_sg -#define hwiommu_sync_single_for_device machvec_dma_sync_single -#define hwiommu_sync_sg_for_device machvec_dma_sync_sg - - /* * Note: we need to make the determination of whether or not to use * the sw I/O TLB based purely on the device structure. Anything else * would be unreliable or would be too intrusive. */ -static inline int -use_swiotlb (struct device *dev) +static inline int use_swiotlb(struct device *dev) +{ + return dev && dev->dma_mask && + !sba_dma_ops.dma_supported(dev, *dev->dma_mask); +} + +struct dma_map_ops *hwsw_dma_get_ops(struct device *dev) { - return dev && dev->dma_mask && !hwiommu_dma_supported(dev, *dev->dma_mask); + if (use_swiotlb(dev)) + return &swiotlb_dma_ops; + return &sba_dma_ops; } +EXPORT_SYMBOL(hwsw_dma_get_ops); void __init hwsw_init (void) @@ -71,125 +56,3 @@ hwsw_init (void) #endif } } - -void * -hwsw_alloc_coherent (struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flags) -{ - if (use_swiotlb(dev)) - return swiotlb_alloc_coherent(dev, size, dma_handle, flags); - else - return hwiommu_alloc_coherent(dev, size, dma_handle, flags); -} - -void -hwsw_free_coherent (struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle) -{ - if (use_swiotlb(dev)) - swiotlb_free_coherent(dev, size, vaddr, dma_handle); - else - hwiommu_free_coherent(dev, size, vaddr, dma_handle); -} - -dma_addr_t -hwsw_map_single_attrs(struct device *dev, void *addr, size_t size, int dir, - struct dma_attrs *attrs) -{ - if (use_swiotlb(dev)) - return swiotlb_map_single_attrs(dev, addr, size, dir, attrs); - else - return hwiommu_map_single_attrs(dev, addr, size, dir, attrs); -} -EXPORT_SYMBOL(hwsw_map_single_attrs); - -void -hwsw_unmap_single_attrs(struct device *dev, dma_addr_t iova, size_t size, - int dir, struct dma_attrs *attrs) -{ - if (use_swiotlb(dev)) - return swiotlb_unmap_single_attrs(dev, iova, size, dir, attrs); - else - return hwiommu_unmap_single_attrs(dev, iova, size, dir, attrs); -} -EXPORT_SYMBOL(hwsw_unmap_single_attrs); - -int -hwsw_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, - int dir, struct dma_attrs *attrs) -{ - if (use_swiotlb(dev)) - return swiotlb_map_sg_attrs(dev, sglist, nents, dir, attrs); - else - return hwiommu_map_sg_attrs(dev, sglist, nents, dir, attrs); -} -EXPORT_SYMBOL(hwsw_map_sg_attrs); - -void -hwsw_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, - int dir, struct dma_attrs *attrs) -{ - if (use_swiotlb(dev)) - return swiotlb_unmap_sg_attrs(dev, sglist, nents, dir, attrs); - else - return hwiommu_unmap_sg_attrs(dev, sglist, nents, dir, attrs); -} -EXPORT_SYMBOL(hwsw_unmap_sg_attrs); - -void -hwsw_sync_single_for_cpu (struct device *dev, dma_addr_t addr, size_t size, int dir) -{ - if (use_swiotlb(dev)) - swiotlb_sync_single_for_cpu(dev, addr, size, dir); - else - hwiommu_sync_single_for_cpu(dev, addr, size, dir); -} - -void -hwsw_sync_sg_for_cpu (struct device *dev, struct scatterlist *sg, int nelems, int dir) -{ - if (use_swiotlb(dev)) - swiotlb_sync_sg_for_cpu(dev, sg, nelems, dir); - else - hwiommu_sync_sg_for_cpu(dev, sg, nelems, dir); -} - -void -hwsw_sync_single_for_device (struct device *dev, dma_addr_t addr, size_t size, int dir) -{ - if (use_swiotlb(dev)) - swiotlb_sync_single_for_device(dev, addr, size, dir); - else - hwiommu_sync_single_for_device(dev, addr, size, dir); -} - -void -hwsw_sync_sg_for_device (struct device *dev, struct scatterlist *sg, int nelems, int dir) -{ - if (use_swiotlb(dev)) - swiotlb_sync_sg_for_device(dev, sg, nelems, dir); - else - hwiommu_sync_sg_for_device(dev, sg, nelems, dir); -} - -int -hwsw_dma_supported (struct device *dev, u64 mask) -{ - if (hwiommu_dma_supported(dev, mask)) - return 1; - return swiotlb_dma_supported(dev, mask); -} - -int -hwsw_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return hwiommu_dma_mapping_error(dev, dma_addr) || - swiotlb_dma_mapping_error(dev, dma_addr); -} - -EXPORT_SYMBOL(hwsw_dma_mapping_error); -EXPORT_SYMBOL(hwsw_dma_supported); -EXPORT_SYMBOL(hwsw_alloc_coherent); -EXPORT_SYMBOL(hwsw_free_coherent); -EXPORT_SYMBOL(hwsw_sync_single_for_cpu); -EXPORT_SYMBOL(hwsw_sync_single_for_device); -EXPORT_SYMBOL(hwsw_sync_sg_for_cpu); -EXPORT_SYMBOL(hwsw_sync_sg_for_device); Index: linux-2.6-tip/arch/ia64/hp/common/sba_iommu.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/hp/common/sba_iommu.c +++ linux-2.6-tip/arch/ia64/hp/common/sba_iommu.c @@ -36,6 +36,7 @@ #include /* hweight64() */ #include #include +#include #include /* ia64_get_itc() */ #include @@ -908,11 +909,13 @@ sba_mark_invalid(struct ioc *ioc, dma_ad * * See Documentation/PCI/PCI-DMA-mapping.txt */ -dma_addr_t -sba_map_single_attrs(struct device *dev, void *addr, size_t size, int dir, - struct dma_attrs *attrs) +static dma_addr_t sba_map_page(struct device *dev, struct page *page, + unsigned long poff, size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) { struct ioc *ioc; + void *addr = page_address(page) + poff; dma_addr_t iovp; dma_addr_t offset; u64 *pdir_start; @@ -990,7 +993,14 @@ sba_map_single_attrs(struct device *dev, #endif return SBA_IOVA(ioc, iovp, offset); } -EXPORT_SYMBOL(sba_map_single_attrs); + +static dma_addr_t sba_map_single_attrs(struct device *dev, void *addr, + size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + return sba_map_page(dev, virt_to_page(addr), + (unsigned long)addr & ~PAGE_MASK, size, dir, attrs); +} #ifdef ENABLE_MARK_CLEAN static SBA_INLINE void @@ -1026,8 +1036,8 @@ sba_mark_clean(struct ioc *ioc, dma_addr * * See Documentation/PCI/PCI-DMA-mapping.txt */ -void sba_unmap_single_attrs(struct device *dev, dma_addr_t iova, size_t size, - int dir, struct dma_attrs *attrs) +static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) { struct ioc *ioc; #if DELAYED_RESOURCE_CNT > 0 @@ -1094,7 +1104,12 @@ void sba_unmap_single_attrs(struct devic spin_unlock_irqrestore(&ioc->res_lock, flags); #endif /* DELAYED_RESOURCE_CNT == 0 */ } -EXPORT_SYMBOL(sba_unmap_single_attrs); + +void sba_unmap_single_attrs(struct device *dev, dma_addr_t iova, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) +{ + sba_unmap_page(dev, iova, size, dir, attrs); +} /** * sba_alloc_coherent - allocate/map shared mem for DMA @@ -1104,7 +1119,7 @@ EXPORT_SYMBOL(sba_unmap_single_attrs); * * See Documentation/PCI/PCI-DMA-mapping.txt */ -void * +static void * sba_alloc_coherent (struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flags) { struct ioc *ioc; @@ -1167,7 +1182,8 @@ sba_alloc_coherent (struct device *dev, * * See Documentation/PCI/PCI-DMA-mapping.txt */ -void sba_free_coherent (struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle) +static void sba_free_coherent (struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle) { sba_unmap_single_attrs(dev, dma_handle, size, 0, NULL); free_pages((unsigned long) vaddr, get_order(size)); @@ -1422,8 +1438,9 @@ sba_coalesce_chunks(struct ioc *ioc, str * * See Documentation/PCI/PCI-DMA-mapping.txt */ -int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, - int dir, struct dma_attrs *attrs) +static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, + int nents, enum dma_data_direction dir, + struct dma_attrs *attrs) { struct ioc *ioc; int coalesced, filled = 0; @@ -1502,7 +1519,6 @@ int sba_map_sg_attrs(struct device *dev, return filled; } -EXPORT_SYMBOL(sba_map_sg_attrs); /** * sba_unmap_sg_attrs - unmap Scatter/Gather list @@ -1514,8 +1530,9 @@ EXPORT_SYMBOL(sba_map_sg_attrs); * * See Documentation/PCI/PCI-DMA-mapping.txt */ -void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, - int nents, int dir, struct dma_attrs *attrs) +static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, + int nents, enum dma_data_direction dir, + struct dma_attrs *attrs) { #ifdef ASSERT_PDIR_SANITY struct ioc *ioc; @@ -1551,7 +1568,6 @@ void sba_unmap_sg_attrs(struct device *d #endif } -EXPORT_SYMBOL(sba_unmap_sg_attrs); /************************************************************** * @@ -2064,6 +2080,8 @@ static struct acpi_driver acpi_sba_ioc_d }, }; +extern struct dma_map_ops swiotlb_dma_ops; + static int __init sba_init(void) { @@ -2077,6 +2095,7 @@ sba_init(void) * a successful kdump kernel boot is to use the swiotlb. */ if (is_kdump_kernel()) { + dma_ops = &swiotlb_dma_ops; if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0) panic("Unable to initialize software I/O TLB:" " Try machvec=dig boot option"); @@ -2092,6 +2111,7 @@ sba_init(void) * If we didn't find something sba_iommu can claim, we * need to setup the swiotlb and switch to the dig machvec. */ + dma_ops = &swiotlb_dma_ops; if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0) panic("Unable to find SBA IOMMU or initialize " "software I/O TLB: Try machvec=dig boot option"); @@ -2138,15 +2158,13 @@ nosbagart(char *str) return 1; } -int -sba_dma_supported (struct device *dev, u64 mask) +static int sba_dma_supported (struct device *dev, u64 mask) { /* make sure it's at least 32bit capable */ return ((mask & 0xFFFFFFFFUL) == 0xFFFFFFFFUL); } -int -sba_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) +static int sba_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { return 0; } @@ -2176,7 +2194,22 @@ sba_page_override(char *str) __setup("sbapagesize=",sba_page_override); -EXPORT_SYMBOL(sba_dma_mapping_error); -EXPORT_SYMBOL(sba_dma_supported); -EXPORT_SYMBOL(sba_alloc_coherent); -EXPORT_SYMBOL(sba_free_coherent); +struct dma_map_ops sba_dma_ops = { + .alloc_coherent = sba_alloc_coherent, + .free_coherent = sba_free_coherent, + .map_page = sba_map_page, + .unmap_page = sba_unmap_page, + .map_sg = sba_map_sg_attrs, + .unmap_sg = sba_unmap_sg_attrs, + .sync_single_for_cpu = machvec_dma_sync_single, + .sync_sg_for_cpu = machvec_dma_sync_sg, + .sync_single_for_device = machvec_dma_sync_single, + .sync_sg_for_device = machvec_dma_sync_sg, + .dma_supported = sba_dma_supported, + .mapping_error = sba_dma_mapping_error, +}; + +void sba_dma_init(void) +{ + dma_ops = &sba_dma_ops; +} Index: linux-2.6-tip/arch/ia64/include/asm/dma-mapping.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/dma-mapping.h +++ linux-2.6-tip/arch/ia64/include/asm/dma-mapping.h @@ -11,99 +11,128 @@ #define ARCH_HAS_DMA_GET_REQUIRED_MASK -struct dma_mapping_ops { - int (*mapping_error)(struct device *dev, - dma_addr_t dma_addr); - void* (*alloc_coherent)(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp); - void (*free_coherent)(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle); - dma_addr_t (*map_single)(struct device *hwdev, unsigned long ptr, - size_t size, int direction); - void (*unmap_single)(struct device *dev, dma_addr_t addr, - size_t size, int direction); - void (*sync_single_for_cpu)(struct device *hwdev, - dma_addr_t dma_handle, size_t size, - int direction); - void (*sync_single_for_device)(struct device *hwdev, - dma_addr_t dma_handle, size_t size, - int direction); - void (*sync_single_range_for_cpu)(struct device *hwdev, - dma_addr_t dma_handle, unsigned long offset, - size_t size, int direction); - void (*sync_single_range_for_device)(struct device *hwdev, - dma_addr_t dma_handle, unsigned long offset, - size_t size, int direction); - void (*sync_sg_for_cpu)(struct device *hwdev, - struct scatterlist *sg, int nelems, - int direction); - void (*sync_sg_for_device)(struct device *hwdev, - struct scatterlist *sg, int nelems, - int direction); - int (*map_sg)(struct device *hwdev, struct scatterlist *sg, - int nents, int direction); - void (*unmap_sg)(struct device *hwdev, - struct scatterlist *sg, int nents, - int direction); - int (*dma_supported_op)(struct device *hwdev, u64 mask); - int is_phys; -}; - -extern struct dma_mapping_ops *dma_ops; +extern struct dma_map_ops *dma_ops; extern struct ia64_machine_vector ia64_mv; extern void set_iommu_machvec(void); -#define dma_alloc_coherent(dev, size, handle, gfp) \ - platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA) +extern void machvec_dma_sync_single(struct device *, dma_addr_t, size_t, + enum dma_data_direction); +extern void machvec_dma_sync_sg(struct device *, struct scatterlist *, int, + enum dma_data_direction); -/* coherent mem. is cheap */ -static inline void * -dma_alloc_noncoherent(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t flag) +static inline void *dma_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *daddr, gfp_t gfp) { - return dma_alloc_coherent(dev, size, dma_handle, flag); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->alloc_coherent(dev, size, daddr, gfp); } -#define dma_free_coherent platform_dma_free_coherent -static inline void -dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t dma_handle) + +static inline void dma_free_coherent(struct device *dev, size_t size, + void *caddr, dma_addr_t daddr) { - dma_free_coherent(dev, size, cpu_addr, dma_handle); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->free_coherent(dev, size, caddr, daddr); } -#define dma_map_single_attrs platform_dma_map_single_attrs -static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, - size_t size, int dir) + +#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) +#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) + +static inline dma_addr_t dma_map_single_attrs(struct device *dev, + void *caddr, size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) { - return dma_map_single_attrs(dev, cpu_addr, size, dir, NULL); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->map_page(dev, virt_to_page(caddr), + (unsigned long)caddr & ~PAGE_MASK, size, + dir, attrs); } -#define dma_map_sg_attrs platform_dma_map_sg_attrs -static inline int dma_map_sg(struct device *dev, struct scatterlist *sgl, - int nents, int dir) + +static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t daddr, + size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) { - return dma_map_sg_attrs(dev, sgl, nents, dir, NULL); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->unmap_page(dev, daddr, size, dir, attrs); } -#define dma_unmap_single_attrs platform_dma_unmap_single_attrs -static inline void dma_unmap_single(struct device *dev, dma_addr_t cpu_addr, - size_t size, int dir) + +#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL) +#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, NULL) + +static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, + int nents, enum dma_data_direction dir, + struct dma_attrs *attrs) { - return dma_unmap_single_attrs(dev, cpu_addr, size, dir, NULL); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->map_sg(dev, sgl, nents, dir, attrs); } -#define dma_unmap_sg_attrs platform_dma_unmap_sg_attrs -static inline void dma_unmap_sg(struct device *dev, struct scatterlist *sgl, - int nents, int dir) + +static inline void dma_unmap_sg_attrs(struct device *dev, + struct scatterlist *sgl, int nents, + enum dma_data_direction dir, + struct dma_attrs *attrs) { - return dma_unmap_sg_attrs(dev, sgl, nents, dir, NULL); + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->unmap_sg(dev, sgl, nents, dir, attrs); } -#define dma_sync_single_for_cpu platform_dma_sync_single_for_cpu -#define dma_sync_sg_for_cpu platform_dma_sync_sg_for_cpu -#define dma_sync_single_for_device platform_dma_sync_single_for_device -#define dma_sync_sg_for_device platform_dma_sync_sg_for_device -#define dma_mapping_error platform_dma_mapping_error -#define dma_map_page(dev, pg, off, size, dir) \ - dma_map_single(dev, page_address(pg) + (off), (size), (dir)) -#define dma_unmap_page(dev, dma_addr, size, dir) \ - dma_unmap_single(dev, dma_addr, size, dir) +#define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, NULL) +#define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, NULL) + +static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t daddr, + size_t size, + enum dma_data_direction dir) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->sync_single_for_cpu(dev, daddr, size, dir); +} + +static inline void dma_sync_sg_for_cpu(struct device *dev, + struct scatterlist *sgl, + int nents, enum dma_data_direction dir) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->sync_sg_for_cpu(dev, sgl, nents, dir); +} + +static inline void dma_sync_single_for_device(struct device *dev, + dma_addr_t daddr, + size_t size, + enum dma_data_direction dir) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->sync_single_for_device(dev, daddr, size, dir); +} + +static inline void dma_sync_sg_for_device(struct device *dev, + struct scatterlist *sgl, + int nents, + enum dma_data_direction dir) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + ops->sync_sg_for_device(dev, sgl, nents, dir); +} + +static inline int dma_mapping_error(struct device *dev, dma_addr_t daddr) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->mapping_error(dev, daddr); +} + +static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, + size_t offset, size_t size, + enum dma_data_direction dir) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->map_page(dev, page, offset, size, dir, NULL); +} + +static inline void dma_unmap_page(struct device *dev, dma_addr_t addr, + size_t size, enum dma_data_direction dir) +{ + dma_unmap_single(dev, addr, size, dir); +} /* * Rest of this file is part of the "Advanced DMA API". Use at your own risk. @@ -115,7 +144,11 @@ static inline void dma_unmap_sg(struct d #define dma_sync_single_range_for_device(dev, dma_handle, offset, size, dir) \ dma_sync_single_for_device(dev, dma_handle, size, dir) -#define dma_supported platform_dma_supported +static inline int dma_supported(struct device *dev, u64 mask) +{ + struct dma_map_ops *ops = platform_dma_get_ops(dev); + return ops->dma_supported(dev, mask); +} static inline int dma_set_mask (struct device *dev, u64 mask) @@ -141,11 +174,4 @@ dma_cache_sync (struct device *dev, void #define dma_is_consistent(d, h) (1) /* all we do is coherent memory... */ -static inline struct dma_mapping_ops *get_dma_ops(struct device *dev) -{ - return dma_ops; -} - - - #endif /* _ASM_IA64_DMA_MAPPING_H */ Index: linux-2.6-tip/arch/ia64/include/asm/fpu.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/fpu.h +++ linux-2.6-tip/arch/ia64/include/asm/fpu.h @@ -6,8 +6,6 @@ * David Mosberger-Tang */ -#include - /* floating point status register: */ #define FPSR_TRAP_VD (1 << 0) /* invalid op trap disabled */ #define FPSR_TRAP_DD (1 << 1) /* denormal trap disabled */ Index: linux-2.6-tip/arch/ia64/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/ia64/include/asm/ftrace.h @@ -0,0 +1,28 @@ +#ifndef _ASM_IA64_FTRACE_H +#define _ASM_IA64_FTRACE_H + +#ifdef CONFIG_FUNCTION_TRACER +#define MCOUNT_INSN_SIZE 32 /* sizeof mcount call */ + +#ifndef __ASSEMBLY__ +extern void _mcount(unsigned long pfs, unsigned long r1, unsigned long b0, unsigned long r0); +#define mcount _mcount + +#include +/* In IA64, MCOUNT_ADDR is set in link time, so it's not a constant at compile time */ +#define MCOUNT_ADDR (((struct fnptr *)mcount)->ip) +#define FTRACE_ADDR (((struct fnptr *)ftrace_caller)->ip) + +static inline unsigned long ftrace_call_adjust(unsigned long addr) +{ + /* second bundle, insn 2 */ + return addr - 0x12; +} + +struct dyn_arch_ftrace { +}; +#endif + +#endif /* CONFIG_FUNCTION_TRACER */ + +#endif /* _ASM_IA64_FTRACE_H */ Index: linux-2.6-tip/arch/ia64/include/asm/gcc_intrin.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/gcc_intrin.h +++ linux-2.6-tip/arch/ia64/include/asm/gcc_intrin.h @@ -6,6 +6,7 @@ * Copyright (C) 2002,2003 Suresh Siddha */ +#include #include /* define this macro to get some asm stmts included in 'c' files */ Index: linux-2.6-tip/arch/ia64/include/asm/hardirq.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/hardirq.h +++ linux-2.6-tip/arch/ia64/include/asm/hardirq.h @@ -20,16 +20,6 @@ #define local_softirq_pending() (local_cpu_data->softirq_pending) -#define HARDIRQ_BITS 14 - -/* - * The hardirq mask has to be large enough to have space for potentially all IRQ sources - * in the system nesting on a single CPU: - */ -#if (1 << HARDIRQ_BITS) < NR_IRQS -# error HARDIRQ_BITS is too low! -#endif - extern void __iomem *ipi_base_addr; void ack_bad_irq(unsigned int irq); Index: linux-2.6-tip/arch/ia64/include/asm/intrinsics.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/intrinsics.h +++ linux-2.6-tip/arch/ia64/include/asm/intrinsics.h @@ -10,6 +10,7 @@ #ifndef __ASSEMBLY__ +#include /* include compiler specific intrinsics */ #include #ifdef __INTEL_COMPILER Index: linux-2.6-tip/arch/ia64/include/asm/kvm.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/kvm.h +++ linux-2.6-tip/arch/ia64/include/asm/kvm.h @@ -21,8 +21,7 @@ * */ -#include - +#include #include /* Select x86 specific features in */ Index: linux-2.6-tip/arch/ia64/include/asm/machvec.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/machvec.h +++ linux-2.6-tip/arch/ia64/include/asm/machvec.h @@ -11,7 +11,6 @@ #define _ASM_IA64_MACHVEC_H #include -#include /* forward declarations: */ struct device; @@ -45,24 +44,8 @@ typedef void ia64_mv_kernel_launch_event /* DMA-mapping interface: */ typedef void ia64_mv_dma_init (void); -typedef void *ia64_mv_dma_alloc_coherent (struct device *, size_t, dma_addr_t *, gfp_t); -typedef void ia64_mv_dma_free_coherent (struct device *, size_t, void *, dma_addr_t); -typedef dma_addr_t ia64_mv_dma_map_single (struct device *, void *, size_t, int); -typedef void ia64_mv_dma_unmap_single (struct device *, dma_addr_t, size_t, int); -typedef int ia64_mv_dma_map_sg (struct device *, struct scatterlist *, int, int); -typedef void ia64_mv_dma_unmap_sg (struct device *, struct scatterlist *, int, int); -typedef void ia64_mv_dma_sync_single_for_cpu (struct device *, dma_addr_t, size_t, int); -typedef void ia64_mv_dma_sync_sg_for_cpu (struct device *, struct scatterlist *, int, int); -typedef void ia64_mv_dma_sync_single_for_device (struct device *, dma_addr_t, size_t, int); -typedef void ia64_mv_dma_sync_sg_for_device (struct device *, struct scatterlist *, int, int); -typedef int ia64_mv_dma_mapping_error(struct device *, dma_addr_t dma_addr); -typedef int ia64_mv_dma_supported (struct device *, u64); - -typedef dma_addr_t ia64_mv_dma_map_single_attrs (struct device *, void *, size_t, int, struct dma_attrs *); -typedef void ia64_mv_dma_unmap_single_attrs (struct device *, dma_addr_t, size_t, int, struct dma_attrs *); -typedef int ia64_mv_dma_map_sg_attrs (struct device *, struct scatterlist *, int, int, struct dma_attrs *); -typedef void ia64_mv_dma_unmap_sg_attrs (struct device *, struct scatterlist *, int, int, struct dma_attrs *); typedef u64 ia64_mv_dma_get_required_mask (struct device *); +typedef struct dma_map_ops *ia64_mv_dma_get_ops(struct device *); /* * WARNING: The legacy I/O space is _architected_. Platforms are @@ -114,8 +97,6 @@ machvec_noop_bus (struct pci_bus *bus) extern void machvec_setup (char **); extern void machvec_timer_interrupt (int, void *); -extern void machvec_dma_sync_single (struct device *, dma_addr_t, size_t, int); -extern void machvec_dma_sync_sg (struct device *, struct scatterlist *, int, int); extern void machvec_tlb_migrate_finish (struct mm_struct *); # if defined (CONFIG_IA64_HP_SIM) @@ -148,19 +129,8 @@ extern void machvec_tlb_migrate_finish ( # define platform_global_tlb_purge ia64_mv.global_tlb_purge # define platform_tlb_migrate_finish ia64_mv.tlb_migrate_finish # define platform_dma_init ia64_mv.dma_init -# define platform_dma_alloc_coherent ia64_mv.dma_alloc_coherent -# define platform_dma_free_coherent ia64_mv.dma_free_coherent -# define platform_dma_map_single_attrs ia64_mv.dma_map_single_attrs -# define platform_dma_unmap_single_attrs ia64_mv.dma_unmap_single_attrs -# define platform_dma_map_sg_attrs ia64_mv.dma_map_sg_attrs -# define platform_dma_unmap_sg_attrs ia64_mv.dma_unmap_sg_attrs -# define platform_dma_sync_single_for_cpu ia64_mv.dma_sync_single_for_cpu -# define platform_dma_sync_sg_for_cpu ia64_mv.dma_sync_sg_for_cpu -# define platform_dma_sync_single_for_device ia64_mv.dma_sync_single_for_device -# define platform_dma_sync_sg_for_device ia64_mv.dma_sync_sg_for_device -# define platform_dma_mapping_error ia64_mv.dma_mapping_error -# define platform_dma_supported ia64_mv.dma_supported # define platform_dma_get_required_mask ia64_mv.dma_get_required_mask +# define platform_dma_get_ops ia64_mv.dma_get_ops # define platform_irq_to_vector ia64_mv.irq_to_vector # define platform_local_vector_to_irq ia64_mv.local_vector_to_irq # define platform_pci_get_legacy_mem ia64_mv.pci_get_legacy_mem @@ -203,19 +173,8 @@ struct ia64_machine_vector { ia64_mv_global_tlb_purge_t *global_tlb_purge; ia64_mv_tlb_migrate_finish_t *tlb_migrate_finish; ia64_mv_dma_init *dma_init; - ia64_mv_dma_alloc_coherent *dma_alloc_coherent; - ia64_mv_dma_free_coherent *dma_free_coherent; - ia64_mv_dma_map_single_attrs *dma_map_single_attrs; - ia64_mv_dma_unmap_single_attrs *dma_unmap_single_attrs; - ia64_mv_dma_map_sg_attrs *dma_map_sg_attrs; - ia64_mv_dma_unmap_sg_attrs *dma_unmap_sg_attrs; - ia64_mv_dma_sync_single_for_cpu *dma_sync_single_for_cpu; - ia64_mv_dma_sync_sg_for_cpu *dma_sync_sg_for_cpu; - ia64_mv_dma_sync_single_for_device *dma_sync_single_for_device; - ia64_mv_dma_sync_sg_for_device *dma_sync_sg_for_device; - ia64_mv_dma_mapping_error *dma_mapping_error; - ia64_mv_dma_supported *dma_supported; ia64_mv_dma_get_required_mask *dma_get_required_mask; + ia64_mv_dma_get_ops *dma_get_ops; ia64_mv_irq_to_vector *irq_to_vector; ia64_mv_local_vector_to_irq *local_vector_to_irq; ia64_mv_pci_get_legacy_mem_t *pci_get_legacy_mem; @@ -254,19 +213,8 @@ struct ia64_machine_vector { platform_global_tlb_purge, \ platform_tlb_migrate_finish, \ platform_dma_init, \ - platform_dma_alloc_coherent, \ - platform_dma_free_coherent, \ - platform_dma_map_single_attrs, \ - platform_dma_unmap_single_attrs, \ - platform_dma_map_sg_attrs, \ - platform_dma_unmap_sg_attrs, \ - platform_dma_sync_single_for_cpu, \ - platform_dma_sync_sg_for_cpu, \ - platform_dma_sync_single_for_device, \ - platform_dma_sync_sg_for_device, \ - platform_dma_mapping_error, \ - platform_dma_supported, \ platform_dma_get_required_mask, \ + platform_dma_get_ops, \ platform_irq_to_vector, \ platform_local_vector_to_irq, \ platform_pci_get_legacy_mem, \ @@ -302,6 +250,9 @@ extern void machvec_init_from_cmdline(co # error Unknown configuration. Update arch/ia64/include/asm/machvec.h. # endif /* CONFIG_IA64_GENERIC */ +extern void swiotlb_dma_init(void); +extern struct dma_map_ops *dma_get_ops(struct device *); + /* * Define default versions so we can extend machvec for new platforms without having * to update the machvec files for all existing platforms. @@ -332,43 +283,10 @@ extern void machvec_init_from_cmdline(co # define platform_kernel_launch_event machvec_noop #endif #ifndef platform_dma_init -# define platform_dma_init swiotlb_init -#endif -#ifndef platform_dma_alloc_coherent -# define platform_dma_alloc_coherent swiotlb_alloc_coherent -#endif -#ifndef platform_dma_free_coherent -# define platform_dma_free_coherent swiotlb_free_coherent -#endif -#ifndef platform_dma_map_single_attrs -# define platform_dma_map_single_attrs swiotlb_map_single_attrs -#endif -#ifndef platform_dma_unmap_single_attrs -# define platform_dma_unmap_single_attrs swiotlb_unmap_single_attrs -#endif -#ifndef platform_dma_map_sg_attrs -# define platform_dma_map_sg_attrs swiotlb_map_sg_attrs -#endif -#ifndef platform_dma_unmap_sg_attrs -# define platform_dma_unmap_sg_attrs swiotlb_unmap_sg_attrs -#endif -#ifndef platform_dma_sync_single_for_cpu -# define platform_dma_sync_single_for_cpu swiotlb_sync_single_for_cpu -#endif -#ifndef platform_dma_sync_sg_for_cpu -# define platform_dma_sync_sg_for_cpu swiotlb_sync_sg_for_cpu -#endif -#ifndef platform_dma_sync_single_for_device -# define platform_dma_sync_single_for_device swiotlb_sync_single_for_device -#endif -#ifndef platform_dma_sync_sg_for_device -# define platform_dma_sync_sg_for_device swiotlb_sync_sg_for_device -#endif -#ifndef platform_dma_mapping_error -# define platform_dma_mapping_error swiotlb_dma_mapping_error +# define platform_dma_init swiotlb_dma_init #endif -#ifndef platform_dma_supported -# define platform_dma_supported swiotlb_dma_supported +#ifndef platform_dma_get_ops +# define platform_dma_get_ops dma_get_ops #endif #ifndef platform_dma_get_required_mask # define platform_dma_get_required_mask ia64_dma_get_required_mask Index: linux-2.6-tip/arch/ia64/include/asm/machvec_dig_vtd.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/machvec_dig_vtd.h +++ linux-2.6-tip/arch/ia64/include/asm/machvec_dig_vtd.h @@ -2,14 +2,6 @@ #define _ASM_IA64_MACHVEC_DIG_VTD_h extern ia64_mv_setup_t dig_setup; -extern ia64_mv_dma_alloc_coherent vtd_alloc_coherent; -extern ia64_mv_dma_free_coherent vtd_free_coherent; -extern ia64_mv_dma_map_single_attrs vtd_map_single_attrs; -extern ia64_mv_dma_unmap_single_attrs vtd_unmap_single_attrs; -extern ia64_mv_dma_map_sg_attrs vtd_map_sg_attrs; -extern ia64_mv_dma_unmap_sg_attrs vtd_unmap_sg_attrs; -extern ia64_mv_dma_supported iommu_dma_supported; -extern ia64_mv_dma_mapping_error vtd_dma_mapping_error; extern ia64_mv_dma_init pci_iommu_alloc; /* @@ -22,17 +14,5 @@ extern ia64_mv_dma_init pci_iommu_allo #define platform_name "dig_vtd" #define platform_setup dig_setup #define platform_dma_init pci_iommu_alloc -#define platform_dma_alloc_coherent vtd_alloc_coherent -#define platform_dma_free_coherent vtd_free_coherent -#define platform_dma_map_single_attrs vtd_map_single_attrs -#define platform_dma_unmap_single_attrs vtd_unmap_single_attrs -#define platform_dma_map_sg_attrs vtd_map_sg_attrs -#define platform_dma_unmap_sg_attrs vtd_unmap_sg_attrs -#define platform_dma_sync_single_for_cpu machvec_dma_sync_single -#define platform_dma_sync_sg_for_cpu machvec_dma_sync_sg -#define platform_dma_sync_single_for_device machvec_dma_sync_single -#define platform_dma_sync_sg_for_device machvec_dma_sync_sg -#define platform_dma_supported iommu_dma_supported -#define platform_dma_mapping_error vtd_dma_mapping_error #endif /* _ASM_IA64_MACHVEC_DIG_VTD_h */ Index: linux-2.6-tip/arch/ia64/include/asm/machvec_hpzx1.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/machvec_hpzx1.h +++ linux-2.6-tip/arch/ia64/include/asm/machvec_hpzx1.h @@ -2,14 +2,7 @@ #define _ASM_IA64_MACHVEC_HPZX1_h extern ia64_mv_setup_t dig_setup; -extern ia64_mv_dma_alloc_coherent sba_alloc_coherent; -extern ia64_mv_dma_free_coherent sba_free_coherent; -extern ia64_mv_dma_map_single_attrs sba_map_single_attrs; -extern ia64_mv_dma_unmap_single_attrs sba_unmap_single_attrs; -extern ia64_mv_dma_map_sg_attrs sba_map_sg_attrs; -extern ia64_mv_dma_unmap_sg_attrs sba_unmap_sg_attrs; -extern ia64_mv_dma_supported sba_dma_supported; -extern ia64_mv_dma_mapping_error sba_dma_mapping_error; +extern ia64_mv_dma_init sba_dma_init; /* * This stuff has dual use! @@ -20,18 +13,6 @@ extern ia64_mv_dma_mapping_error sba_dma */ #define platform_name "hpzx1" #define platform_setup dig_setup -#define platform_dma_init machvec_noop -#define platform_dma_alloc_coherent sba_alloc_coherent -#define platform_dma_free_coherent sba_free_coherent -#define platform_dma_map_single_attrs sba_map_single_attrs -#define platform_dma_unmap_single_attrs sba_unmap_single_attrs -#define platform_dma_map_sg_attrs sba_map_sg_attrs -#define platform_dma_unmap_sg_attrs sba_unmap_sg_attrs -#define platform_dma_sync_single_for_cpu machvec_dma_sync_single -#define platform_dma_sync_sg_for_cpu machvec_dma_sync_sg -#define platform_dma_sync_single_for_device machvec_dma_sync_single -#define platform_dma_sync_sg_for_device machvec_dma_sync_sg -#define platform_dma_supported sba_dma_supported -#define platform_dma_mapping_error sba_dma_mapping_error +#define platform_dma_init sba_dma_init #endif /* _ASM_IA64_MACHVEC_HPZX1_h */ Index: linux-2.6-tip/arch/ia64/include/asm/machvec_hpzx1_swiotlb.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/machvec_hpzx1_swiotlb.h +++ linux-2.6-tip/arch/ia64/include/asm/machvec_hpzx1_swiotlb.h @@ -2,18 +2,7 @@ #define _ASM_IA64_MACHVEC_HPZX1_SWIOTLB_h extern ia64_mv_setup_t dig_setup; -extern ia64_mv_dma_alloc_coherent hwsw_alloc_coherent; -extern ia64_mv_dma_free_coherent hwsw_free_coherent; -extern ia64_mv_dma_map_single_attrs hwsw_map_single_attrs; -extern ia64_mv_dma_unmap_single_attrs hwsw_unmap_single_attrs; -extern ia64_mv_dma_map_sg_attrs hwsw_map_sg_attrs; -extern ia64_mv_dma_unmap_sg_attrs hwsw_unmap_sg_attrs; -extern ia64_mv_dma_supported hwsw_dma_supported; -extern ia64_mv_dma_mapping_error hwsw_dma_mapping_error; -extern ia64_mv_dma_sync_single_for_cpu hwsw_sync_single_for_cpu; -extern ia64_mv_dma_sync_sg_for_cpu hwsw_sync_sg_for_cpu; -extern ia64_mv_dma_sync_single_for_device hwsw_sync_single_for_device; -extern ia64_mv_dma_sync_sg_for_device hwsw_sync_sg_for_device; +extern ia64_mv_dma_get_ops hwsw_dma_get_ops; /* * This stuff has dual use! @@ -23,20 +12,8 @@ extern ia64_mv_dma_sync_sg_for_device h * the macros are used directly. */ #define platform_name "hpzx1_swiotlb" - #define platform_setup dig_setup #define platform_dma_init machvec_noop -#define platform_dma_alloc_coherent hwsw_alloc_coherent -#define platform_dma_free_coherent hwsw_free_coherent -#define platform_dma_map_single_attrs hwsw_map_single_attrs -#define platform_dma_unmap_single_attrs hwsw_unmap_single_attrs -#define platform_dma_map_sg_attrs hwsw_map_sg_attrs -#define platform_dma_unmap_sg_attrs hwsw_unmap_sg_attrs -#define platform_dma_supported hwsw_dma_supported -#define platform_dma_mapping_error hwsw_dma_mapping_error -#define platform_dma_sync_single_for_cpu hwsw_sync_single_for_cpu -#define platform_dma_sync_sg_for_cpu hwsw_sync_sg_for_cpu -#define platform_dma_sync_single_for_device hwsw_sync_single_for_device -#define platform_dma_sync_sg_for_device hwsw_sync_sg_for_device +#define platform_dma_get_ops hwsw_dma_get_ops #endif /* _ASM_IA64_MACHVEC_HPZX1_SWIOTLB_h */ Index: linux-2.6-tip/arch/ia64/include/asm/machvec_sn2.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/machvec_sn2.h +++ linux-2.6-tip/arch/ia64/include/asm/machvec_sn2.h @@ -55,19 +55,8 @@ extern ia64_mv_readb_t __sn_readb_relaxe extern ia64_mv_readw_t __sn_readw_relaxed; extern ia64_mv_readl_t __sn_readl_relaxed; extern ia64_mv_readq_t __sn_readq_relaxed; -extern ia64_mv_dma_alloc_coherent sn_dma_alloc_coherent; -extern ia64_mv_dma_free_coherent sn_dma_free_coherent; -extern ia64_mv_dma_map_single_attrs sn_dma_map_single_attrs; -extern ia64_mv_dma_unmap_single_attrs sn_dma_unmap_single_attrs; -extern ia64_mv_dma_map_sg_attrs sn_dma_map_sg_attrs; -extern ia64_mv_dma_unmap_sg_attrs sn_dma_unmap_sg_attrs; -extern ia64_mv_dma_sync_single_for_cpu sn_dma_sync_single_for_cpu; -extern ia64_mv_dma_sync_sg_for_cpu sn_dma_sync_sg_for_cpu; -extern ia64_mv_dma_sync_single_for_device sn_dma_sync_single_for_device; -extern ia64_mv_dma_sync_sg_for_device sn_dma_sync_sg_for_device; -extern ia64_mv_dma_mapping_error sn_dma_mapping_error; -extern ia64_mv_dma_supported sn_dma_supported; extern ia64_mv_dma_get_required_mask sn_dma_get_required_mask; +extern ia64_mv_dma_init sn_dma_init; extern ia64_mv_migrate_t sn_migrate; extern ia64_mv_kernel_launch_event_t sn_kernel_launch_event; extern ia64_mv_setup_msi_irq_t sn_setup_msi_irq; @@ -111,20 +100,8 @@ extern ia64_mv_pci_fixup_bus_t sn_pci_f #define platform_pci_get_legacy_mem sn_pci_get_legacy_mem #define platform_pci_legacy_read sn_pci_legacy_read #define platform_pci_legacy_write sn_pci_legacy_write -#define platform_dma_init machvec_noop -#define platform_dma_alloc_coherent sn_dma_alloc_coherent -#define platform_dma_free_coherent sn_dma_free_coherent -#define platform_dma_map_single_attrs sn_dma_map_single_attrs -#define platform_dma_unmap_single_attrs sn_dma_unmap_single_attrs -#define platform_dma_map_sg_attrs sn_dma_map_sg_attrs -#define platform_dma_unmap_sg_attrs sn_dma_unmap_sg_attrs -#define platform_dma_sync_single_for_cpu sn_dma_sync_single_for_cpu -#define platform_dma_sync_sg_for_cpu sn_dma_sync_sg_for_cpu -#define platform_dma_sync_single_for_device sn_dma_sync_single_for_device -#define platform_dma_sync_sg_for_device sn_dma_sync_sg_for_device -#define platform_dma_mapping_error sn_dma_mapping_error -#define platform_dma_supported sn_dma_supported #define platform_dma_get_required_mask sn_dma_get_required_mask +#define platform_dma_init sn_dma_init #define platform_migrate sn_migrate #define platform_kernel_launch_event sn_kernel_launch_event #ifdef CONFIG_PCI_MSI Index: linux-2.6-tip/arch/ia64/include/asm/percpu.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/percpu.h +++ linux-2.6-tip/arch/ia64/include/asm/percpu.h @@ -27,12 +27,12 @@ extern void *per_cpu_init(void); #else /* ! SMP */ -#define PER_CPU_ATTRIBUTES __attribute__((__section__(".data.percpu"))) - #define per_cpu_init() (__phys_per_cpu_start) #endif /* SMP */ +#define PER_CPU_BASE_SECTION ".data.percpu" + /* * Be extremely careful when taking the address of this variable! Due to virtual * remapping, it is different from the canonical address returned by __get_cpu_var(var)! Index: linux-2.6-tip/arch/ia64/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/swab.h +++ linux-2.6-tip/arch/ia64/include/asm/swab.h @@ -6,7 +6,7 @@ * David Mosberger-Tang , Hewlett-Packard Co. */ -#include +#include #include #include Index: linux-2.6-tip/arch/ia64/include/asm/topology.h =================================================================== --- linux-2.6-tip.orig/arch/ia64/include/asm/topology.h +++ linux-2.6-tip/arch/ia64/include/asm/topology.h @@ -84,7 +84,7 @@ void build_cpu_to_node_map(void); .child = NULL, \ .groups = NULL, \ .min_interval = 8, \ - .max_interval = 8*(min(num_online_cpus(), 32)), \ + .max_interval = 8*(min(num_online_cpus(), 32U)), \ .busy_factor = 64, \ .imbalance_pct = 125, \ .cache_nice_tries = 2, \ Index: linux-2.6-tip/arch/ia64/include/asm/uv/uv.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/ia64/include/asm/uv/uv.h @@ -0,0 +1,13 @@ +#ifndef _ASM_IA64_UV_UV_H +#define _ASM_IA64_UV_UV_H + +#include +#include + +static inline int is_uv_system(void) +{ + /* temporary support for running on hardware simulator */ + return IS_MEDUSA() || ia64_platform_is("uv"); +} + +#endif /* _ASM_IA64_UV_UV_H */ Index: linux-2.6-tip/arch/ia64/kernel/Makefile =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/Makefile +++ linux-2.6-tip/arch/ia64/kernel/Makefile @@ -2,12 +2,16 @@ # Makefile for the linux kernel. # +ifdef CONFIG_DYNAMIC_FTRACE +CFLAGS_REMOVE_ftrace.o = -pg +endif + extra-y := head.o init_task.o vmlinux.lds obj-y := acpi.o entry.o efi.o efi_stub.o gate-data.o fsys.o ia64_ksyms.o irq.o irq_ia64.o \ irq_lsapic.o ivt.o machvec.o pal.o patch.o process.o perfmon.o ptrace.o sal.o \ salinfo.o setup.o signal.o sys_ia64.o time.o traps.o unaligned.o \ - unwind.o mca.o mca_asm.o topology.o + unwind.o mca.o mca_asm.o topology.o dma-mapping.o obj-$(CONFIG_IA64_BRL_EMU) += brl_emu.o obj-$(CONFIG_IA64_GENERIC) += acpi-ext.o @@ -28,6 +32,7 @@ obj-$(CONFIG_IA64_CYCLONE) += cyclone.o obj-$(CONFIG_CPU_FREQ) += cpufreq/ obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o +obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR) += uncached.o @@ -43,9 +48,7 @@ ifneq ($(CONFIG_IA64_ESI),) obj-y += esi_stub.o # must be in kernel proper endif obj-$(CONFIG_DMAR) += pci-dma.o -ifeq ($(CONFIG_DMAR), y) obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o -endif # The gate DSO image is built using a special linker script. targets += gate.so gate-syms.o Index: linux-2.6-tip/arch/ia64/kernel/acpi.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/acpi.c +++ linux-2.6-tip/arch/ia64/kernel/acpi.c @@ -199,6 +199,10 @@ char *__init __acpi_map_table(unsigned l return __va(phys_addr); } +void __init __acpi_unmap_table(char *map, unsigned long size) +{ +} + /* -------------------------------------------------------------------------- Boot-time Table Parsing -------------------------------------------------------------------------- */ Index: linux-2.6-tip/arch/ia64/kernel/dma-mapping.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/ia64/kernel/dma-mapping.c @@ -0,0 +1,13 @@ +#include + +/* Set this to 1 if there is a HW IOMMU in the system */ +int iommu_detected __read_mostly; + +struct dma_map_ops *dma_ops; +EXPORT_SYMBOL(dma_ops); + +struct dma_map_ops *dma_get_ops(struct device *dev) +{ + return dma_ops; +} +EXPORT_SYMBOL(dma_get_ops); Index: linux-2.6-tip/arch/ia64/kernel/entry.S =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/entry.S +++ linux-2.6-tip/arch/ia64/kernel/entry.S @@ -47,6 +47,7 @@ #include #include #include +#include #include "minstate.h" @@ -1404,6 +1405,105 @@ GLOBAL_ENTRY(unw_init_running) br.ret.sptk.many rp END(unw_init_running) +#ifdef CONFIG_FUNCTION_TRACER +#ifdef CONFIG_DYNAMIC_FTRACE +GLOBAL_ENTRY(_mcount) + br ftrace_stub +END(_mcount) + +.here: + br.ret.sptk.many b0 + +GLOBAL_ENTRY(ftrace_caller) + alloc out0 = ar.pfs, 8, 0, 4, 0 + mov out3 = r0 + ;; + mov out2 = b0 + add r3 = 0x20, r3 + mov out1 = r1; + br.call.sptk.many b0 = ftrace_patch_gp + //this might be called from module, so we must patch gp +ftrace_patch_gp: + movl gp=__gp + mov b0 = r3 + ;; +.global ftrace_call; +ftrace_call: +{ + .mlx + nop.m 0x0 + movl r3 = .here;; +} + alloc loc0 = ar.pfs, 4, 4, 2, 0 + ;; + mov loc1 = b0 + mov out0 = b0 + mov loc2 = r8 + mov loc3 = r15 + ;; + adds out0 = -MCOUNT_INSN_SIZE, out0 + mov out1 = in2 + mov b6 = r3 + + br.call.sptk.many b0 = b6 + ;; + mov ar.pfs = loc0 + mov b0 = loc1 + mov r8 = loc2 + mov r15 = loc3 + br ftrace_stub + ;; +END(ftrace_caller) + +#else +GLOBAL_ENTRY(_mcount) + movl r2 = ftrace_stub + movl r3 = ftrace_trace_function;; + ld8 r3 = [r3];; + ld8 r3 = [r3];; + cmp.eq p7,p0 = r2, r3 +(p7) br.sptk.many ftrace_stub + ;; + + alloc loc0 = ar.pfs, 4, 4, 2, 0 + ;; + mov loc1 = b0 + mov out0 = b0 + mov loc2 = r8 + mov loc3 = r15 + ;; + adds out0 = -MCOUNT_INSN_SIZE, out0 + mov out1 = in2 + mov b6 = r3 + + br.call.sptk.many b0 = b6 + ;; + mov ar.pfs = loc0 + mov b0 = loc1 + mov r8 = loc2 + mov r15 = loc3 + br ftrace_stub + ;; +END(_mcount) +#endif + +GLOBAL_ENTRY(ftrace_stub) + mov r3 = b0 + movl r2 = _mcount_ret_helper + ;; + mov b6 = r2 + mov b7 = r3 + br.ret.sptk.many b6 + +_mcount_ret_helper: + mov b0 = r42 + mov r1 = r41 + mov ar.pfs = r40 + br b7 +END(ftrace_stub) + +#endif /* CONFIG_FUNCTION_TRACER */ + .rodata .align 8 .globl sys_call_table Index: linux-2.6-tip/arch/ia64/kernel/ftrace.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/ia64/kernel/ftrace.c @@ -0,0 +1,206 @@ +/* + * Dynamic function tracing support. + * + * Copyright (C) 2008 Shaohua Li + * + * For licencing details, see COPYING. + * + * Defines low-level handling of mcount calls when the kernel + * is compiled with the -pg flag. When using dynamic ftrace, the + * mcount call-sites get patched lazily with NOP till they are + * enabled. All code mutation routines here take effect atomically. + */ + +#include +#include + +#include +#include + +/* In IA64, each function will be added below two bundles with -pg option */ +static unsigned char __attribute__((aligned(8))) +ftrace_orig_code[MCOUNT_INSN_SIZE] = { + 0x02, 0x40, 0x31, 0x10, 0x80, 0x05, /* alloc r40=ar.pfs,12,8,0 */ + 0xb0, 0x02, 0x00, 0x00, 0x42, 0x40, /* mov r43=r0;; */ + 0x05, 0x00, 0xc4, 0x00, /* mov r42=b0 */ + 0x11, 0x48, 0x01, 0x02, 0x00, 0x21, /* mov r41=r1 */ + 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, /* nop.i 0x0 */ + 0x08, 0x00, 0x00, 0x50 /* br.call.sptk.many b0 = _mcount;; */ +}; + +struct ftrace_orig_insn { + u64 dummy1, dummy2, dummy3; + u64 dummy4:64-41+13; + u64 imm20:20; + u64 dummy5:3; + u64 sign:1; + u64 dummy6:4; +}; + +/* mcount stub will be converted below for nop */ +static unsigned char ftrace_nop_code[MCOUNT_INSN_SIZE] = { + 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MII] nop.m 0x0 */ + 0x30, 0x00, 0x00, 0x60, 0x00, 0x00, /* mov r3=ip */ + 0x00, 0x00, 0x04, 0x00, /* nop.i 0x0 */ + 0x05, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MLX] nop.m 0x0 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* nop.x 0x0;; */ + 0x00, 0x00, 0x04, 0x00 +}; + +static unsigned char *ftrace_nop_replace(void) +{ + return ftrace_nop_code; +} + +/* + * mcount stub will be converted below for call + * Note: Just the last instruction is changed against nop + * */ +static unsigned char __attribute__((aligned(8))) +ftrace_call_code[MCOUNT_INSN_SIZE] = { + 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MII] nop.m 0x0 */ + 0x30, 0x00, 0x00, 0x60, 0x00, 0x00, /* mov r3=ip */ + 0x00, 0x00, 0x04, 0x00, /* nop.i 0x0 */ + 0x05, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MLX] nop.m 0x0 */ + 0xff, 0xff, 0xff, 0xff, 0x7f, 0x00, /* brl.many .;;*/ + 0xf8, 0xff, 0xff, 0xc8 +}; + +struct ftrace_call_insn { + u64 dummy1, dummy2; + u64 dummy3:48; + u64 imm39_l:16; + u64 imm39_h:23; + u64 dummy4:13; + u64 imm20:20; + u64 dummy5:3; + u64 i:1; + u64 dummy6:4; +}; + +static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr) +{ + struct ftrace_call_insn *code = (void *)ftrace_call_code; + unsigned long offset = addr - (ip + 0x10); + + code->imm39_l = offset >> 24; + code->imm39_h = offset >> 40; + code->imm20 = offset >> 4; + code->i = offset >> 63; + return ftrace_call_code; +} + +static int +ftrace_modify_code(unsigned long ip, unsigned char *old_code, + unsigned char *new_code, int do_check) +{ + unsigned char replaced[MCOUNT_INSN_SIZE]; + + /* + * Note: Due to modules and __init, code can + * disappear and change, we need to protect against faulting + * as well as code changing. We do this by using the + * probe_kernel_* functions. + * + * No real locking needed, this code is run through + * kstop_machine, or before SMP starts. + */ + + if (!do_check) + goto skip_check; + + /* read the text we want to modify */ + if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE)) + return -EFAULT; + + /* Make sure it is what we expect it to be */ + if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0) + return -EINVAL; + +skip_check: + /* replace the text with the new text */ + if (probe_kernel_write(((void *)ip), new_code, MCOUNT_INSN_SIZE)) + return -EPERM; + flush_icache_range(ip, ip + MCOUNT_INSN_SIZE); + + return 0; +} + +static int ftrace_make_nop_check(struct dyn_ftrace *rec, unsigned long addr) +{ + unsigned char __attribute__((aligned(8))) replaced[MCOUNT_INSN_SIZE]; + unsigned long ip = rec->ip; + + if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE)) + return -EFAULT; + if (rec->flags & FTRACE_FL_CONVERTED) { + struct ftrace_call_insn *call_insn, *tmp_call; + + call_insn = (void *)ftrace_call_code; + tmp_call = (void *)replaced; + call_insn->imm39_l = tmp_call->imm39_l; + call_insn->imm39_h = tmp_call->imm39_h; + call_insn->imm20 = tmp_call->imm20; + call_insn->i = tmp_call->i; + if (memcmp(replaced, ftrace_call_code, MCOUNT_INSN_SIZE) != 0) + return -EINVAL; + return 0; + } else { + struct ftrace_orig_insn *call_insn, *tmp_call; + + call_insn = (void *)ftrace_orig_code; + tmp_call = (void *)replaced; + call_insn->sign = tmp_call->sign; + call_insn->imm20 = tmp_call->imm20; + if (memcmp(replaced, ftrace_orig_code, MCOUNT_INSN_SIZE) != 0) + return -EINVAL; + return 0; + } +} + +int ftrace_make_nop(struct module *mod, + struct dyn_ftrace *rec, unsigned long addr) +{ + int ret; + char *new; + + ret = ftrace_make_nop_check(rec, addr); + if (ret) + return ret; + new = ftrace_nop_replace(); + return ftrace_modify_code(rec->ip, NULL, new, 0); +} + +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) +{ + unsigned long ip = rec->ip; + unsigned char *old, *new; + + old= ftrace_nop_replace(); + new = ftrace_call_replace(ip, addr); + return ftrace_modify_code(ip, old, new, 1); +} + +/* in IA64, _mcount can't directly call ftrace_stub. Only jump is ok */ +int ftrace_update_ftrace_func(ftrace_func_t func) +{ + unsigned long ip; + unsigned long addr = ((struct fnptr *)ftrace_call)->ip; + + if (func == ftrace_stub) + return 0; + ip = ((struct fnptr *)func)->ip; + + ia64_patch_imm64(addr + 2, ip); + + flush_icache_range(addr, addr + 16); + return 0; +} + +/* run from kstop_machine */ +int __init ftrace_dyn_arch_init(void *data) +{ + *(unsigned long *)data = 0; + + return 0; +} Index: linux-2.6-tip/arch/ia64/kernel/ia64_ksyms.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/ia64_ksyms.c +++ linux-2.6-tip/arch/ia64/kernel/ia64_ksyms.c @@ -112,3 +112,9 @@ EXPORT_SYMBOL_GPL(esi_call_phys); #endif extern char ia64_ivt[]; EXPORT_SYMBOL(ia64_ivt); + +#include +#ifdef CONFIG_FUNCTION_TRACER +/* mcount is defined in assembly */ +EXPORT_SYMBOL(_mcount); +#endif Index: linux-2.6-tip/arch/ia64/kernel/iosapic.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/iosapic.c +++ linux-2.6-tip/arch/ia64/kernel/iosapic.c @@ -880,7 +880,7 @@ iosapic_unregister_intr (unsigned int gs if (iosapic_intr_info[irq].count == 0) { #ifdef CONFIG_SMP /* Clear affinity */ - cpus_setall(idesc->affinity); + cpumask_setall(idesc->affinity); #endif /* Clear the interrupt information */ iosapic_intr_info[irq].dest = 0; Index: linux-2.6-tip/arch/ia64/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/irq.c +++ linux-2.6-tip/arch/ia64/kernel/irq.c @@ -80,7 +80,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) { - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); } #endif seq_printf(p, " %14s", irq_desc[i].chip->name); @@ -103,7 +103,7 @@ static char irq_redir [NR_IRQS]; // = { void set_irq_affinity_info (unsigned int irq, int hwid, int redir) { if (irq < NR_IRQS) { - cpumask_copy(&irq_desc[irq].affinity, + cpumask_copy(irq_desc[irq].affinity, cpumask_of(cpu_logical_id(hwid))); irq_redir[irq] = (char) (redir & 0xff); } @@ -148,7 +148,7 @@ static void migrate_irqs(void) if (desc->status == IRQ_PER_CPU) continue; - if (cpumask_any_and(&irq_desc[irq].affinity, cpu_online_mask) + if (cpumask_any_and(irq_desc[irq].affinity, cpu_online_mask) >= nr_cpu_ids) { /* * Save it for phase 2 processing Index: linux-2.6-tip/arch/ia64/kernel/irq_ia64.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/irq_ia64.c +++ linux-2.6-tip/arch/ia64/kernel/irq_ia64.c @@ -493,11 +493,13 @@ ia64_handle_irq (ia64_vector vector, str saved_tpr = ia64_getreg(_IA64_REG_CR_TPR); ia64_srlz_d(); while (vector != IA64_SPURIOUS_INT_VECTOR) { + struct irq_desc *desc = irq_to_desc(vector); + if (unlikely(IS_LOCAL_TLB_FLUSH(vector))) { smp_local_flush_tlb(); - kstat_this_cpu.irqs[vector]++; + kstat_incr_irqs_this_cpu(vector, desc); } else if (unlikely(IS_RESCHEDULE(vector))) - kstat_this_cpu.irqs[vector]++; + kstat_incr_irqs_this_cpu(vector, desc); else { int irq = local_vector_to_irq(vector); @@ -551,11 +553,13 @@ void ia64_process_pending_intr(void) * Perform normal interrupt style processing */ while (vector != IA64_SPURIOUS_INT_VECTOR) { + struct irq_desc *desc = irq_to_desc(vector); + if (unlikely(IS_LOCAL_TLB_FLUSH(vector))) { smp_local_flush_tlb(); - kstat_this_cpu.irqs[vector]++; + kstat_incr_irqs_this_cpu(vector, desc); } else if (unlikely(IS_RESCHEDULE(vector))) - kstat_this_cpu.irqs[vector]++; + kstat_incr_irqs_this_cpu(vector, desc); else { struct pt_regs *old_regs = set_irq_regs(NULL); int irq = local_vector_to_irq(vector); Index: linux-2.6-tip/arch/ia64/kernel/machvec.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/machvec.c +++ linux-2.6-tip/arch/ia64/kernel/machvec.c @@ -1,5 +1,5 @@ #include - +#include #include #include @@ -75,14 +75,16 @@ machvec_timer_interrupt (int irq, void * EXPORT_SYMBOL(machvec_timer_interrupt); void -machvec_dma_sync_single (struct device *hwdev, dma_addr_t dma_handle, size_t size, int dir) +machvec_dma_sync_single(struct device *hwdev, dma_addr_t dma_handle, size_t size, + enum dma_data_direction dir) { mb(); } EXPORT_SYMBOL(machvec_dma_sync_single); void -machvec_dma_sync_sg (struct device *hwdev, struct scatterlist *sg, int n, int dir) +machvec_dma_sync_sg(struct device *hwdev, struct scatterlist *sg, int n, + enum dma_data_direction dir) { mb(); } Index: linux-2.6-tip/arch/ia64/kernel/msi_ia64.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/msi_ia64.c +++ linux-2.6-tip/arch/ia64/kernel/msi_ia64.c @@ -75,7 +75,7 @@ static void ia64_set_msi_irq_affinity(un msg.data = data; write_msi_msg(irq, &msg); - irq_desc[irq].affinity = cpumask_of_cpu(cpu); + cpumask_copy(irq_desc[irq].affinity, cpumask_of(cpu)); } #endif /* CONFIG_SMP */ @@ -187,7 +187,7 @@ static void dmar_msi_set_affinity(unsign msg.address_lo |= MSI_ADDR_DESTID_CPU(cpu_physical_id(cpu)); dmar_msi_write(irq, &msg); - irq_desc[irq].affinity = *mask; + cpumask_copy(irq_desc[irq].affinity, mask); } #endif /* CONFIG_SMP */ Index: linux-2.6-tip/arch/ia64/kernel/pci-dma.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/pci-dma.c +++ linux-2.6-tip/arch/ia64/kernel/pci-dma.c @@ -32,9 +32,6 @@ int force_iommu __read_mostly = 1; int force_iommu __read_mostly; #endif -/* Set this to 1 if there is a HW IOMMU in the system */ -int iommu_detected __read_mostly; - /* Dummy device used for NULL arguments (normally ISA). Better would be probably a smaller DMA mask, but this is bug-to-bug compatible to i386. */ @@ -44,18 +41,7 @@ struct device fallback_dev = { .dma_mask = &fallback_dev.coherent_dma_mask, }; -void __init pci_iommu_alloc(void) -{ - /* - * The order of these functions is important for - * fall-back/fail-over reasons - */ - detect_intel_iommu(); - -#ifdef CONFIG_SWIOTLB - pci_swiotlb_init(); -#endif -} +extern struct dma_map_ops intel_dma_ops; static int __init pci_iommu_init(void) { @@ -79,15 +65,12 @@ iommu_dma_init(void) return; } -struct dma_mapping_ops *dma_ops; -EXPORT_SYMBOL(dma_ops); - int iommu_dma_supported(struct device *dev, u64 mask) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = platform_dma_get_ops(dev); - if (ops->dma_supported_op) - return ops->dma_supported_op(dev, mask); + if (ops->dma_supported) + return ops->dma_supported(dev, mask); /* Copied from i386. Doesn't make much sense, because it will only work for pci_alloc_coherent. @@ -116,4 +99,25 @@ int iommu_dma_supported(struct device *d } EXPORT_SYMBOL(iommu_dma_supported); +void __init pci_iommu_alloc(void) +{ + dma_ops = &intel_dma_ops; + + dma_ops->sync_single_for_cpu = machvec_dma_sync_single; + dma_ops->sync_sg_for_cpu = machvec_dma_sync_sg; + dma_ops->sync_single_for_device = machvec_dma_sync_single; + dma_ops->sync_sg_for_device = machvec_dma_sync_sg; + dma_ops->dma_supported = iommu_dma_supported; + + /* + * The order of these functions is important for + * fall-back/fail-over reasons + */ + detect_intel_iommu(); + +#ifdef CONFIG_SWIOTLB + pci_swiotlb_init(); +#endif +} + #endif Index: linux-2.6-tip/arch/ia64/kernel/pci-swiotlb.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/pci-swiotlb.c +++ linux-2.6-tip/arch/ia64/kernel/pci-swiotlb.c @@ -13,23 +13,37 @@ int swiotlb __read_mostly; EXPORT_SYMBOL(swiotlb); -struct dma_mapping_ops swiotlb_dma_ops = { - .mapping_error = swiotlb_dma_mapping_error, - .alloc_coherent = swiotlb_alloc_coherent, +static void *ia64_swiotlb_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp) +{ + if (dev->coherent_dma_mask != DMA_64BIT_MASK) + gfp |= GFP_DMA; + return swiotlb_alloc_coherent(dev, size, dma_handle, gfp); +} + +struct dma_map_ops swiotlb_dma_ops = { + .alloc_coherent = ia64_swiotlb_alloc_coherent, .free_coherent = swiotlb_free_coherent, - .map_single = swiotlb_map_single, - .unmap_single = swiotlb_unmap_single, + .map_page = swiotlb_map_page, + .unmap_page = swiotlb_unmap_page, + .map_sg = swiotlb_map_sg_attrs, + .unmap_sg = swiotlb_unmap_sg_attrs, .sync_single_for_cpu = swiotlb_sync_single_for_cpu, .sync_single_for_device = swiotlb_sync_single_for_device, .sync_single_range_for_cpu = swiotlb_sync_single_range_for_cpu, .sync_single_range_for_device = swiotlb_sync_single_range_for_device, .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu, .sync_sg_for_device = swiotlb_sync_sg_for_device, - .map_sg = swiotlb_map_sg, - .unmap_sg = swiotlb_unmap_sg, - .dma_supported_op = swiotlb_dma_supported, + .dma_supported = swiotlb_dma_supported, + .mapping_error = swiotlb_dma_mapping_error, }; +void __init swiotlb_dma_init(void) +{ + dma_ops = &swiotlb_dma_ops; + swiotlb_init(); +} + void __init pci_swiotlb_init(void) { if (!iommu_detected) { Index: linux-2.6-tip/arch/ia64/kernel/vmlinux.lds.S =================================================================== --- linux-2.6-tip.orig/arch/ia64/kernel/vmlinux.lds.S +++ linux-2.6-tip/arch/ia64/kernel/vmlinux.lds.S @@ -213,16 +213,9 @@ SECTIONS { *(.data.cacheline_aligned) } /* Per-cpu data: */ - percpu : { } :percpu . = ALIGN(PERCPU_PAGE_SIZE); - __phys_per_cpu_start = .; - .data.percpu PERCPU_ADDR : AT(__phys_per_cpu_start - LOAD_OFFSET) - { - __per_cpu_start = .; - *(.data.percpu) - *(.data.percpu.shared_aligned) - __per_cpu_end = .; - } + PERCPU_VADDR(PERCPU_ADDR, :percpu) + __phys_per_cpu_start = __per_cpu_load; . = __phys_per_cpu_start + PERCPU_PAGE_SIZE; /* ensure percpu data fits * into percpu page size */ Index: linux-2.6-tip/arch/ia64/sn/kernel/msi_sn.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/sn/kernel/msi_sn.c +++ linux-2.6-tip/arch/ia64/sn/kernel/msi_sn.c @@ -205,7 +205,7 @@ static void sn_set_msi_irq_affinity(unsi msg.address_lo = (u32)(bus_addr & 0x00000000ffffffff); write_msi_msg(irq, &msg); - irq_desc[irq].affinity = *cpu_mask; + cpumask_copy(irq_desc[irq].affinity, cpu_mask); } #endif /* CONFIG_SMP */ Index: linux-2.6-tip/arch/ia64/sn/pci/pci_dma.c =================================================================== --- linux-2.6-tip.orig/arch/ia64/sn/pci/pci_dma.c +++ linux-2.6-tip/arch/ia64/sn/pci/pci_dma.c @@ -10,7 +10,7 @@ */ #include -#include +#include #include #include #include @@ -31,7 +31,7 @@ * this function. Of course, SN only supports devices that have 32 or more * address bits when using the PMU. */ -int sn_dma_supported(struct device *dev, u64 mask) +static int sn_dma_supported(struct device *dev, u64 mask) { BUG_ON(dev->bus != &pci_bus_type); @@ -39,7 +39,6 @@ int sn_dma_supported(struct device *dev, return 0; return 1; } -EXPORT_SYMBOL(sn_dma_supported); /** * sn_dma_set_mask - set the DMA mask @@ -75,8 +74,8 @@ EXPORT_SYMBOL(sn_dma_set_mask); * queue for a SCSI controller). See Documentation/DMA-API.txt for * more information. */ -void *sn_dma_alloc_coherent(struct device *dev, size_t size, - dma_addr_t * dma_handle, gfp_t flags) +static void *sn_dma_alloc_coherent(struct device *dev, size_t size, + dma_addr_t * dma_handle, gfp_t flags) { void *cpuaddr; unsigned long phys_addr; @@ -124,7 +123,6 @@ void *sn_dma_alloc_coherent(struct devic return cpuaddr; } -EXPORT_SYMBOL(sn_dma_alloc_coherent); /** * sn_pci_free_coherent - free memory associated with coherent DMAable region @@ -136,8 +134,8 @@ EXPORT_SYMBOL(sn_dma_alloc_coherent); * Frees the memory allocated by dma_alloc_coherent(), potentially unmapping * any associated IOMMU mappings. */ -void sn_dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t dma_handle) +static void sn_dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t dma_handle) { struct pci_dev *pdev = to_pci_dev(dev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); @@ -147,7 +145,6 @@ void sn_dma_free_coherent(struct device provider->dma_unmap(pdev, dma_handle, 0); free_pages((unsigned long)cpu_addr, get_order(size)); } -EXPORT_SYMBOL(sn_dma_free_coherent); /** * sn_dma_map_single_attrs - map a single page for DMA @@ -173,10 +170,12 @@ EXPORT_SYMBOL(sn_dma_free_coherent); * TODO: simplify our interface; * figure out how to save dmamap handle so can use two step. */ -dma_addr_t sn_dma_map_single_attrs(struct device *dev, void *cpu_addr, - size_t size, int direction, - struct dma_attrs *attrs) +static dma_addr_t sn_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) { + void *cpu_addr = page_address(page) + offset; dma_addr_t dma_addr; unsigned long phys_addr; struct pci_dev *pdev = to_pci_dev(dev); @@ -201,7 +200,6 @@ dma_addr_t sn_dma_map_single_attrs(struc } return dma_addr; } -EXPORT_SYMBOL(sn_dma_map_single_attrs); /** * sn_dma_unmap_single_attrs - unamp a DMA mapped page @@ -215,21 +213,20 @@ EXPORT_SYMBOL(sn_dma_map_single_attrs); * by @dma_handle into the coherence domain. On SN, we're always cache * coherent, so we just need to free any ATEs associated with this mapping. */ -void sn_dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr, - size_t size, int direction, - struct dma_attrs *attrs) +static void sn_dma_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) { struct pci_dev *pdev = to_pci_dev(dev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); BUG_ON(dev->bus != &pci_bus_type); - provider->dma_unmap(pdev, dma_addr, direction); + provider->dma_unmap(pdev, dma_addr, dir); } -EXPORT_SYMBOL(sn_dma_unmap_single_attrs); /** - * sn_dma_unmap_sg_attrs - unmap a DMA scatterlist + * sn_dma_unmap_sg - unmap a DMA scatterlist * @dev: device to unmap * @sg: scatterlist to unmap * @nhwentries: number of scatterlist entries @@ -238,9 +235,9 @@ EXPORT_SYMBOL(sn_dma_unmap_single_attrs) * * Unmap a set of streaming mode DMA translations. */ -void sn_dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sgl, - int nhwentries, int direction, - struct dma_attrs *attrs) +static void sn_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, + int nhwentries, enum dma_data_direction dir, + struct dma_attrs *attrs) { int i; struct pci_dev *pdev = to_pci_dev(dev); @@ -250,15 +247,14 @@ void sn_dma_unmap_sg_attrs(struct device BUG_ON(dev->bus != &pci_bus_type); for_each_sg(sgl, sg, nhwentries, i) { - provider->dma_unmap(pdev, sg->dma_address, direction); + provider->dma_unmap(pdev, sg->dma_address, dir); sg->dma_address = (dma_addr_t) NULL; sg->dma_length = 0; } } -EXPORT_SYMBOL(sn_dma_unmap_sg_attrs); /** - * sn_dma_map_sg_attrs - map a scatterlist for DMA + * sn_dma_map_sg - map a scatterlist for DMA * @dev: device to map for * @sg: scatterlist to map * @nhwentries: number of entries @@ -272,8 +268,9 @@ EXPORT_SYMBOL(sn_dma_unmap_sg_attrs); * * Maps each entry of @sg for DMA. */ -int sn_dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, - int nhwentries, int direction, struct dma_attrs *attrs) +static int sn_dma_map_sg(struct device *dev, struct scatterlist *sgl, + int nhwentries, enum dma_data_direction dir, + struct dma_attrs *attrs) { unsigned long phys_addr; struct scatterlist *saved_sg = sgl, *sg; @@ -310,8 +307,7 @@ int sn_dma_map_sg_attrs(struct device *d * Free any successfully allocated entries. */ if (i > 0) - sn_dma_unmap_sg_attrs(dev, saved_sg, i, - direction, attrs); + sn_dma_unmap_sg(dev, saved_sg, i, dir, attrs); return 0; } @@ -320,41 +316,36 @@ int sn_dma_map_sg_attrs(struct device *d return nhwentries; } -EXPORT_SYMBOL(sn_dma_map_sg_attrs); -void sn_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, - size_t size, int direction) +static void sn_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, + size_t size, enum dma_data_direction dir) { BUG_ON(dev->bus != &pci_bus_type); } -EXPORT_SYMBOL(sn_dma_sync_single_for_cpu); -void sn_dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, - size_t size, int direction) +static void sn_dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, + size_t size, + enum dma_data_direction dir) { BUG_ON(dev->bus != &pci_bus_type); } -EXPORT_SYMBOL(sn_dma_sync_single_for_device); -void sn_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, - int nelems, int direction) +static void sn_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, + int nelems, enum dma_data_direction dir) { BUG_ON(dev->bus != &pci_bus_type); } -EXPORT_SYMBOL(sn_dma_sync_sg_for_cpu); -void sn_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, - int nelems, int direction) +static void sn_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, + int nelems, enum dma_data_direction dir) { BUG_ON(dev->bus != &pci_bus_type); } -EXPORT_SYMBOL(sn_dma_sync_sg_for_device); -int sn_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) +static int sn_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { return 0; } -EXPORT_SYMBOL(sn_dma_mapping_error); u64 sn_dma_get_required_mask(struct device *dev) { @@ -471,3 +462,23 @@ int sn_pci_legacy_write(struct pci_bus * out: return ret; } + +static struct dma_map_ops sn_dma_ops = { + .alloc_coherent = sn_dma_alloc_coherent, + .free_coherent = sn_dma_free_coherent, + .map_page = sn_dma_map_page, + .unmap_page = sn_dma_unmap_page, + .map_sg = sn_dma_map_sg, + .unmap_sg = sn_dma_unmap_sg, + .sync_single_for_cpu = sn_dma_sync_single_for_cpu, + .sync_sg_for_cpu = sn_dma_sync_sg_for_cpu, + .sync_single_for_device = sn_dma_sync_single_for_device, + .sync_sg_for_device = sn_dma_sync_sg_for_device, + .mapping_error = sn_dma_mapping_error, + .dma_supported = sn_dma_supported, +}; + +void sn_dma_init(void) +{ + dma_ops = &sn_dma_ops; +} Index: linux-2.6-tip/arch/m32r/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/m32r/kernel/irq.c +++ linux-2.6-tip/arch/m32r/kernel/irq.c @@ -49,7 +49,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #endif seq_printf(p, " %14s", irq_desc[i].chip->typename); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/m68k/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/m68k/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/mips/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/mips/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/mips/include/asm/irq.h =================================================================== --- linux-2.6-tip.orig/arch/mips/include/asm/irq.h +++ linux-2.6-tip/arch/mips/include/asm/irq.h @@ -66,7 +66,7 @@ extern void smtc_forward_irq(unsigned in */ #define IRQ_AFFINITY_HOOK(irq) \ do { \ - if (!cpu_isset(smp_processor_id(), irq_desc[irq].affinity)) { \ + if (!cpumask_test_cpu(smp_processor_id(), irq_desc[irq].affinity)) {\ smtc_forward_irq(irq); \ irq_exit(); \ return; \ Index: linux-2.6-tip/arch/mips/include/asm/sigcontext.h =================================================================== --- linux-2.6-tip.orig/arch/mips/include/asm/sigcontext.h +++ linux-2.6-tip/arch/mips/include/asm/sigcontext.h @@ -9,6 +9,7 @@ #ifndef _ASM_SIGCONTEXT_H #define _ASM_SIGCONTEXT_H +#include #include #if _MIPS_SIM == _MIPS_SIM_ABI32 Index: linux-2.6-tip/arch/mips/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/mips/include/asm/swab.h +++ linux-2.6-tip/arch/mips/include/asm/swab.h @@ -9,7 +9,7 @@ #define _ASM_SWAB_H #include -#include +#include #define __SWAB_64_THRU_32__ Index: linux-2.6-tip/arch/mips/kernel/irq-gic.c =================================================================== --- linux-2.6-tip.orig/arch/mips/kernel/irq-gic.c +++ linux-2.6-tip/arch/mips/kernel/irq-gic.c @@ -187,7 +187,7 @@ static void gic_set_affinity(unsigned in set_bit(irq, pcpu_masks[first_cpu(tmp)].pcpu_mask); } - irq_desc[irq].affinity = *cpumask; + cpumask_copy(irq_desc[irq].affinity, cpumask); spin_unlock_irqrestore(&gic_lock, flags); } Index: linux-2.6-tip/arch/mips/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/mips/kernel/irq.c +++ linux-2.6-tip/arch/mips/kernel/irq.c @@ -108,7 +108,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #endif seq_printf(p, " %14s", irq_desc[i].chip->name); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/mips/kernel/smtc.c =================================================================== --- linux-2.6-tip.orig/arch/mips/kernel/smtc.c +++ linux-2.6-tip/arch/mips/kernel/smtc.c @@ -686,7 +686,7 @@ void smtc_forward_irq(unsigned int irq) * and efficiency, we just pick the easiest one to find. */ - target = first_cpu(irq_desc[irq].affinity); + target = cpumask_first(irq_desc[irq].affinity); /* * We depend on the platform code to have correctly processed @@ -921,11 +921,13 @@ void ipi_decode(struct smtc_ipi *pipi) struct clock_event_device *cd; void *arg_copy = pipi->arg; int type_copy = pipi->type; + int irq = MIPS_CPU_IRQ_BASE + 1; + smtc_ipi_nq(&freeIPIq, pipi); switch (type_copy) { case SMTC_CLOCK_TICK: irq_enter(); - kstat_this_cpu.irqs[MIPS_CPU_IRQ_BASE + 1]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); cd = &per_cpu(mips_clockevent_device, cpu); cd->event_handler(cd); irq_exit(); Index: linux-2.6-tip/arch/mips/mti-malta/malta-smtc.c =================================================================== --- linux-2.6-tip.orig/arch/mips/mti-malta/malta-smtc.c +++ linux-2.6-tip/arch/mips/mti-malta/malta-smtc.c @@ -116,7 +116,7 @@ struct plat_smp_ops msmtc_smp_ops = { void plat_set_irq_affinity(unsigned int irq, const struct cpumask *affinity) { - cpumask_t tmask = *affinity; + cpumask_t tmask; int cpu = 0; void smtc_set_irq_affinity(unsigned int irq, cpumask_t aff); @@ -139,11 +139,12 @@ void plat_set_irq_affinity(unsigned int * be made to forward to an offline "CPU". */ + cpumask_copy(&tmask, affinity); for_each_cpu(cpu, affinity) { if ((cpu_data[cpu].vpe_id != 0) || !cpu_online(cpu)) cpu_clear(cpu, tmask); } - irq_desc[irq].affinity = tmask; + cpumask_copy(irq_desc[irq].affinity, &tmask); if (cpus_empty(tmask)) /* Index: linux-2.6-tip/arch/mips/sgi-ip22/ip22-int.c =================================================================== --- linux-2.6-tip.orig/arch/mips/sgi-ip22/ip22-int.c +++ linux-2.6-tip/arch/mips/sgi-ip22/ip22-int.c @@ -155,7 +155,7 @@ static void indy_buserror_irq(void) int irq = SGI_BUSERR_IRQ; irq_enter(); - kstat_this_cpu.irqs[irq]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); ip22_be_interrupt(irq); irq_exit(); } Index: linux-2.6-tip/arch/mips/sgi-ip22/ip22-time.c =================================================================== --- linux-2.6-tip.orig/arch/mips/sgi-ip22/ip22-time.c +++ linux-2.6-tip/arch/mips/sgi-ip22/ip22-time.c @@ -122,7 +122,7 @@ void indy_8254timer_irq(void) char c; irq_enter(); - kstat_this_cpu.irqs[irq]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); printk(KERN_ALERT "Oops, got 8254 interrupt.\n"); ArcRead(0, &c, 1, &cnt); ArcEnterInteractiveMode(); Index: linux-2.6-tip/arch/mips/sibyte/bcm1480/smp.c =================================================================== --- linux-2.6-tip.orig/arch/mips/sibyte/bcm1480/smp.c +++ linux-2.6-tip/arch/mips/sibyte/bcm1480/smp.c @@ -178,9 +178,10 @@ struct plat_smp_ops bcm1480_smp_ops = { void bcm1480_mailbox_interrupt(void) { int cpu = smp_processor_id(); + int irq = K_BCM1480_INT_MBOX_0_0; unsigned int action; - kstat_this_cpu.irqs[K_BCM1480_INT_MBOX_0_0]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); /* Load the mailbox register to figure out what we're supposed to do */ action = (__raw_readq(mailbox_0_regs[cpu]) >> 48) & 0xffff; Index: linux-2.6-tip/arch/mips/sibyte/sb1250/smp.c =================================================================== --- linux-2.6-tip.orig/arch/mips/sibyte/sb1250/smp.c +++ linux-2.6-tip/arch/mips/sibyte/sb1250/smp.c @@ -166,9 +166,10 @@ struct plat_smp_ops sb_smp_ops = { void sb1250_mailbox_interrupt(void) { int cpu = smp_processor_id(); + int irq = K_INT_MBOX_0; unsigned int action; - kstat_this_cpu.irqs[K_INT_MBOX_0]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); /* Load the mailbox register to figure out what we're supposed to do */ action = (____raw_readq(mailbox_regs[cpu]) >> 48) & 0xffff; Index: linux-2.6-tip/arch/mn10300/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/mn10300/kernel/irq.c +++ linux-2.6-tip/arch/mn10300/kernel/irq.c @@ -221,7 +221,7 @@ int show_interrupts(struct seq_file *p, if (action) { seq_printf(p, "%3d: ", i); for_each_present_cpu(cpu) - seq_printf(p, "%10u ", kstat_cpu(cpu).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, cpu)); seq_printf(p, " %14s.%u", irq_desc[i].chip->name, (GxICR(i) & GxICR_LEVEL) >> GxICR_LEVEL_SHIFT); Index: linux-2.6-tip/arch/mn10300/kernel/mn10300-watchdog.c =================================================================== --- linux-2.6-tip.orig/arch/mn10300/kernel/mn10300-watchdog.c +++ linux-2.6-tip/arch/mn10300/kernel/mn10300-watchdog.c @@ -130,6 +130,7 @@ void watchdog_interrupt(struct pt_regs * * the stack NMI-atomically, it's safe to use smp_processor_id(). */ int sum, cpu = smp_processor_id(); + int irq = NMIIRQ; u8 wdt, tmp; wdt = WDCTR & ~WDCTR_WDCNE; @@ -138,7 +139,7 @@ void watchdog_interrupt(struct pt_regs * NMICR = NMICR_WDIF; nmi_count(cpu)++; - kstat_this_cpu.irqs[NMIIRQ]++; + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); sum = irq_stat[cpu].__irq_count; if (last_irq_sums[cpu] == sum) { Index: linux-2.6-tip/arch/parisc/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/parisc/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/parisc/include/asm/pdc.h =================================================================== --- linux-2.6-tip.orig/arch/parisc/include/asm/pdc.h +++ linux-2.6-tip/arch/parisc/include/asm/pdc.h @@ -336,10 +336,11 @@ #define NUM_PDC_RESULT 32 #if !defined(__ASSEMBLY__) -#ifdef __KERNEL__ #include +#ifdef __KERNEL__ + extern int pdc_type; /* Values for pdc_type */ Index: linux-2.6-tip/arch/parisc/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/parisc/include/asm/swab.h +++ linux-2.6-tip/arch/parisc/include/asm/swab.h @@ -1,7 +1,7 @@ #ifndef _PARISC_SWAB_H #define _PARISC_SWAB_H -#include +#include #include #define __SWAB_64_THRU_32__ Index: linux-2.6-tip/arch/parisc/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/parisc/kernel/irq.c +++ linux-2.6-tip/arch/parisc/kernel/irq.c @@ -138,7 +138,7 @@ static void cpu_set_affinity_irq(unsigne if (cpu_dest < 0) return; - cpumask_copy(&irq_desc[irq].affinity, &cpumask_of_cpu(cpu_dest)); + cpumask_copy(&irq_desc[irq].affinity, dest); } #endif @@ -185,7 +185,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%3d: ", i); #ifdef CONFIG_SMP for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #else seq_printf(p, "%10u ", kstat_irqs(i)); #endif Index: linux-2.6-tip/arch/powerpc/include/asm/bootx.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/bootx.h +++ linux-2.6-tip/arch/powerpc/include/asm/bootx.h @@ -9,7 +9,7 @@ #ifndef __ASM_BOOTX_H__ #define __ASM_BOOTX_H__ -#include +#include #ifdef macintosh #include Index: linux-2.6-tip/arch/powerpc/include/asm/elf.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/elf.h +++ linux-2.6-tip/arch/powerpc/include/asm/elf.h @@ -7,7 +7,7 @@ #include #endif -#include +#include #include #include #include Index: linux-2.6-tip/arch/powerpc/include/asm/hw_irq.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/hw_irq.h +++ linux-2.6-tip/arch/powerpc/include/asm/hw_irq.h @@ -131,5 +131,44 @@ static inline int irqs_disabled_flags(un */ struct hw_interrupt_type; +#ifdef CONFIG_PERF_COUNTERS +static inline unsigned long get_perf_counter_pending(void) +{ + unsigned long x; + + asm volatile("lbz %0,%1(13)" + : "=r" (x) + : "i" (offsetof(struct paca_struct, perf_counter_pending))); + return x; +} + +static inline void set_perf_counter_pending(void) +{ + asm volatile("stb %0,%1(13)" : : + "r" (1), + "i" (offsetof(struct paca_struct, perf_counter_pending))); +} + +static inline void clear_perf_counter_pending(void) +{ + asm volatile("stb %0,%1(13)" : : + "r" (0), + "i" (offsetof(struct paca_struct, perf_counter_pending))); +} + +extern void perf_counter_do_pending(void); + +#else + +static inline unsigned long get_perf_counter_pending(void) +{ + return 0; +} + +static inline void set_perf_counter_pending(void) {} +static inline void clear_perf_counter_pending(void) {} +static inline void perf_counter_do_pending(void) {} +#endif /* CONFIG_PERF_COUNTERS */ + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_HW_IRQ_H */ Index: linux-2.6-tip/arch/powerpc/include/asm/kvm.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/kvm.h +++ linux-2.6-tip/arch/powerpc/include/asm/kvm.h @@ -20,7 +20,7 @@ #ifndef __LINUX_KVM_POWERPC_H #define __LINUX_KVM_POWERPC_H -#include +#include struct kvm_regs { __u64 pc; Index: linux-2.6-tip/arch/powerpc/include/asm/mmzone.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/mmzone.h +++ linux-2.6-tip/arch/powerpc/include/asm/mmzone.h @@ -8,6 +8,7 @@ #define _ASM_MMZONE_H_ #ifdef __KERNEL__ +#include /* * generic non-linear memory support: Index: linux-2.6-tip/arch/powerpc/include/asm/paca.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/paca.h +++ linux-2.6-tip/arch/powerpc/include/asm/paca.h @@ -99,6 +99,7 @@ struct paca_struct { u8 soft_enabled; /* irq soft-enable flag */ u8 hard_enabled; /* set if irqs are enabled in MSR */ u8 io_sync; /* writel() needs spin_unlock sync */ + u8 perf_counter_pending; /* PM interrupt while soft-disabled */ /* Stuff for accurate time accounting */ u64 user_time; /* accumulated usermode TB ticks */ Index: linux-2.6-tip/arch/powerpc/include/asm/perf_counter.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/include/asm/perf_counter.h @@ -0,0 +1,72 @@ +/* + * Performance counter support - PowerPC-specific definitions. + * + * Copyright 2008-2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include + +#define MAX_HWCOUNTERS 8 +#define MAX_EVENT_ALTERNATIVES 8 + +/* + * This struct provides the constants and functions needed to + * describe the PMU on a particular POWER-family CPU. + */ +struct power_pmu { + int n_counter; + int max_alternatives; + u64 add_fields; + u64 test_adder; + int (*compute_mmcr)(unsigned int events[], int n_ev, + unsigned int hwc[], u64 mmcr[]); + int (*get_constraint)(unsigned int event, u64 *mskp, u64 *valp); + int (*get_alternatives)(unsigned int event, unsigned int alt[]); + void (*disable_pmc)(unsigned int pmc, u64 mmcr[]); + int n_generic; + int *generic_events; +}; + +extern struct power_pmu *ppmu; + +/* + * The power_pmu.get_constraint function returns a 64-bit value and + * a 64-bit mask that express the constraints between this event and + * other events. + * + * The value and mask are divided up into (non-overlapping) bitfields + * of three different types: + * + * Select field: this expresses the constraint that some set of bits + * in MMCR* needs to be set to a specific value for this event. For a + * select field, the mask contains 1s in every bit of the field, and + * the value contains a unique value for each possible setting of the + * MMCR* bits. The constraint checking code will ensure that two events + * that set the same field in their masks have the same value in their + * value dwords. + * + * Add field: this expresses the constraint that there can be at most + * N events in a particular class. A field of k bits can be used for + * N <= 2^(k-1) - 1. The mask has the most significant bit of the field + * set (and the other bits 0), and the value has only the least significant + * bit of the field set. In addition, the 'add_fields' and 'test_adder' + * in the struct power_pmu for this processor come into play. The + * add_fields value contains 1 in the LSB of the field, and the + * test_adder contains 2^(k-1) - 1 - N in the field. + * + * NAND field: this expresses the constraint that you may not have events + * in all of a set of classes. (For example, on PPC970, you can't select + * events from the FPU, ISU and IDU simultaneously, although any two are + * possible.) For N classes, the field is N+1 bits wide, and each class + * is assigned one bit from the least-significant N bits. The mask has + * only the most-significant bit set, and the value has only the bit + * for the event's class set. The test_adder has the least significant + * bit set in the field. + * + * If an event is not subject to the constraint expressed by a particular + * field, then it will have 0 in both the mask and value for that field. + */ Index: linux-2.6-tip/arch/powerpc/include/asm/ps3fb.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/ps3fb.h +++ linux-2.6-tip/arch/powerpc/include/asm/ps3fb.h @@ -19,6 +19,7 @@ #ifndef _ASM_POWERPC_PS3FB_H_ #define _ASM_POWERPC_PS3FB_H_ +#include #include /* ioctl */ Index: linux-2.6-tip/arch/powerpc/include/asm/spu_info.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/spu_info.h +++ linux-2.6-tip/arch/powerpc/include/asm/spu_info.h @@ -23,9 +23,10 @@ #ifndef _SPU_INFO_H #define _SPU_INFO_H +#include + #ifdef __KERNEL__ #include -#include #else struct mfc_cq_sr { __u64 mfc_cq_data0_RW; Index: linux-2.6-tip/arch/powerpc/include/asm/swab.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/swab.h +++ linux-2.6-tip/arch/powerpc/include/asm/swab.h @@ -8,7 +8,7 @@ * 2 of the License, or (at your option) any later version. */ -#include +#include #include #ifdef __GNUC__ Index: linux-2.6-tip/arch/powerpc/include/asm/systbl.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/systbl.h +++ linux-2.6-tip/arch/powerpc/include/asm/systbl.h @@ -322,3 +322,4 @@ SYSCALL_SPU(epoll_create1) SYSCALL_SPU(dup3) SYSCALL_SPU(pipe2) SYSCALL(inotify_init1) +SYSCALL_SPU(perf_counter_open) Index: linux-2.6-tip/arch/powerpc/include/asm/unistd.h =================================================================== --- linux-2.6-tip.orig/arch/powerpc/include/asm/unistd.h +++ linux-2.6-tip/arch/powerpc/include/asm/unistd.h @@ -341,10 +341,11 @@ #define __NR_dup3 316 #define __NR_pipe2 317 #define __NR_inotify_init1 318 +#define __NR_perf_counter_open 319 #ifdef __KERNEL__ -#define __NR_syscalls 319 +#define __NR_syscalls 320 #define __NR__exit __NR_exit #define NR_syscalls __NR_syscalls Index: linux-2.6-tip/arch/powerpc/kernel/Makefile =================================================================== --- linux-2.6-tip.orig/arch/powerpc/kernel/Makefile +++ linux-2.6-tip/arch/powerpc/kernel/Makefile @@ -94,6 +94,8 @@ obj-$(CONFIG_AUDIT) += audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o +obj-$(CONFIG_PERF_COUNTERS) += perf_counter.o power4-pmu.o ppc970-pmu.o \ + power5-pmu.o power5+-pmu.o power6-pmu.o obj-$(CONFIG_8XX_MINIMAL_FPEMU) += softemu8xx.o Index: linux-2.6-tip/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/kernel/asm-offsets.c +++ linux-2.6-tip/arch/powerpc/kernel/asm-offsets.c @@ -131,6 +131,7 @@ int main(void) DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr)); DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled)); DEFINE(PACAHARDIRQEN, offsetof(struct paca_struct, hard_enabled)); + DEFINE(PACAPERFPEND, offsetof(struct paca_struct, perf_counter_pending)); DEFINE(PACASLBCACHE, offsetof(struct paca_struct, slb_cache)); DEFINE(PACASLBCACHEPTR, offsetof(struct paca_struct, slb_cache_ptr)); DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id)); Index: linux-2.6-tip/arch/powerpc/kernel/entry_64.S =================================================================== --- linux-2.6-tip.orig/arch/powerpc/kernel/entry_64.S +++ linux-2.6-tip/arch/powerpc/kernel/entry_64.S @@ -526,6 +526,15 @@ ALT_FW_FTR_SECTION_END_IFCLR(FW_FEATURE_ 2: TRACE_AND_RESTORE_IRQ(r5); +#ifdef CONFIG_PERF_COUNTERS + /* check paca->perf_counter_pending if we're enabling ints */ + lbz r3,PACAPERFPEND(r13) + and. r3,r3,r5 + beq 27f + bl .perf_counter_do_pending +27: +#endif /* CONFIG_PERF_COUNTERS */ + /* extract EE bit and use it to restore paca->hard_enabled */ ld r3,_MSR(r1) rldicl r4,r3,49,63 /* r0 = (r3 >> 15) & 1 */ @@ -616,44 +625,52 @@ do_work: bne restore /* here we are preempting the current task */ 1: + /* + * preempt_schedule_irq() expects interrupts disabled and returns + * with interrupts disabled. No need to check preemption again, + * preempt_schedule_irq just did that for us. + */ + bl .preempt_schedule_irq #ifdef CONFIG_TRACE_IRQFLAGS bl .trace_hardirqs_on +#endif /* CONFIG_TRACE_IRQFLAGS */ + /* Note: we just clobbered r10 which used to contain the previous * MSR before the hard-disabling done by the caller of do_work. * We don't have that value anymore, but it doesn't matter as * we will hard-enable unconditionally, we can just reload the * current MSR into r10 */ + bl .preempt_schedule_irq mfmsr r10 -#endif /* CONFIG_TRACE_IRQFLAGS */ - li r0,1 - stb r0,PACASOFTIRQEN(r13) - stb r0,PACAHARDIRQEN(r13) - ori r10,r10,MSR_EE - mtmsrd r10,1 /* reenable interrupts */ - bl .preempt_schedule - mfmsr r10 - clrrdi r9,r1,THREAD_SHIFT - rldicl r10,r10,48,1 /* disable interrupts again */ - rotldi r10,r10,16 - mtmsrd r10,1 - ld r4,TI_FLAGS(r9) - andi. r0,r4,_TIF_NEED_RESCHED - bne 1b + clrrdi r9,r1,THREAD_SHIFT + rldicl r10,r10,48,1 /* disable interrupts again */ + rotldi r10,r10,16 + mtmsrd r10,1 + ld r4,TI_FLAGS(r9) + andi. r0,r4,(_TIF_NEED_RESCHED) + bne 1b b restore user_work: #endif - /* Enable interrupts */ - ori r10,r10,MSR_EE - mtmsrd r10,1 - andi. r0,r4,_TIF_NEED_RESCHED beq 1f - bl .schedule + + /* preempt_schedule_irq() expects interrupts disabled. */ + bl .preempt_schedule_irq b .ret_from_except_lite -1: bl .save_nvgprs + /* here we are preempting the current task */ +1: li r0,1 + stb r0,PACASOFTIRQEN(r13) + stb r0,PACAHARDIRQEN(r13) + + /* Enable interrupts */ + ori r10,r10,MSR_EE + mtmsrd r10,1 + + bl .save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD bl .do_signal b .ret_from_except Index: linux-2.6-tip/arch/powerpc/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/kernel/irq.c +++ linux-2.6-tip/arch/powerpc/kernel/irq.c @@ -135,6 +135,11 @@ notrace void raw_local_irq_restore(unsig iseries_handle_interrupts(); } + if (get_perf_counter_pending()) { + clear_perf_counter_pending(); + perf_counter_do_pending(); + } + /* * if (get_paca()->hard_enabled) return; * But again we need to take care that gcc gets hard_enabled directly @@ -190,7 +195,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%3d: ", i); #ifdef CONFIG_SMP for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #else seq_printf(p, "%10u ", kstat_irqs(i)); #endif /* CONFIG_SMP */ @@ -231,7 +236,7 @@ void fixup_irqs(cpumask_t map) if (irq_desc[irq].status & IRQ_PER_CPU) continue; - cpus_and(mask, irq_desc[irq].affinity, map); + cpumask_and(&mask, irq_desc[irq].affinity, &map); if (any_online_cpu(mask) == NR_CPUS) { printk("Breaking affinity for irq %i\n", irq); mask = map; @@ -438,7 +443,7 @@ void do_softirq(void) */ static LIST_HEAD(irq_hosts); -static DEFINE_SPINLOCK(irq_big_lock); +static DEFINE_RAW_SPINLOCK(irq_big_lock); static unsigned int revmap_trees_allocated; static DEFINE_MUTEX(revmap_trees_mutex); struct irq_map_entry irq_map[NR_IRQS]; Index: linux-2.6-tip/arch/powerpc/kernel/perf_counter.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/perf_counter.c @@ -0,0 +1,827 @@ +/* + * Performance counter support - powerpc architecture code + * + * Copyright 2008-2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct cpu_hw_counters { + int n_counters; + int n_percpu; + int disabled; + int n_added; + struct perf_counter *counter[MAX_HWCOUNTERS]; + unsigned int events[MAX_HWCOUNTERS]; + u64 mmcr[3]; + u8 pmcs_enabled; +}; +DEFINE_PER_CPU(struct cpu_hw_counters, cpu_hw_counters); + +struct power_pmu *ppmu; + +/* + * Normally, to ignore kernel events we set the FCS (freeze counters + * in supervisor mode) bit in MMCR0, but if the kernel runs with the + * hypervisor bit set in the MSR, or if we are running on a processor + * where the hypervisor bit is forced to 1 (as on Apple G5 processors), + * then we need to use the FCHV bit to ignore kernel events. + */ +static unsigned int freeze_counters_kernel = MMCR0_FCS; + +void perf_counter_print_debug(void) +{ +} + +/* + * Read one performance monitor counter (PMC). + */ +static unsigned long read_pmc(int idx) +{ + unsigned long val; + + switch (idx) { + case 1: + val = mfspr(SPRN_PMC1); + break; + case 2: + val = mfspr(SPRN_PMC2); + break; + case 3: + val = mfspr(SPRN_PMC3); + break; + case 4: + val = mfspr(SPRN_PMC4); + break; + case 5: + val = mfspr(SPRN_PMC5); + break; + case 6: + val = mfspr(SPRN_PMC6); + break; + case 7: + val = mfspr(SPRN_PMC7); + break; + case 8: + val = mfspr(SPRN_PMC8); + break; + default: + printk(KERN_ERR "oops trying to read PMC%d\n", idx); + val = 0; + } + return val; +} + +/* + * Write one PMC. + */ +static void write_pmc(int idx, unsigned long val) +{ + switch (idx) { + case 1: + mtspr(SPRN_PMC1, val); + break; + case 2: + mtspr(SPRN_PMC2, val); + break; + case 3: + mtspr(SPRN_PMC3, val); + break; + case 4: + mtspr(SPRN_PMC4, val); + break; + case 5: + mtspr(SPRN_PMC5, val); + break; + case 6: + mtspr(SPRN_PMC6, val); + break; + case 7: + mtspr(SPRN_PMC7, val); + break; + case 8: + mtspr(SPRN_PMC8, val); + break; + default: + printk(KERN_ERR "oops trying to write PMC%d\n", idx); + } +} + +/* + * Check if a set of events can all go on the PMU at once. + * If they can't, this will look at alternative codes for the events + * and see if any combination of alternative codes is feasible. + * The feasible set is returned in event[]. + */ +static int power_check_constraints(unsigned int event[], int n_ev) +{ + u64 mask, value, nv; + unsigned int alternatives[MAX_HWCOUNTERS][MAX_EVENT_ALTERNATIVES]; + u64 amasks[MAX_HWCOUNTERS][MAX_EVENT_ALTERNATIVES]; + u64 avalues[MAX_HWCOUNTERS][MAX_EVENT_ALTERNATIVES]; + u64 smasks[MAX_HWCOUNTERS], svalues[MAX_HWCOUNTERS]; + int n_alt[MAX_HWCOUNTERS], choice[MAX_HWCOUNTERS]; + int i, j; + u64 addf = ppmu->add_fields; + u64 tadd = ppmu->test_adder; + + if (n_ev > ppmu->n_counter) + return -1; + + /* First see if the events will go on as-is */ + for (i = 0; i < n_ev; ++i) { + alternatives[i][0] = event[i]; + if (ppmu->get_constraint(event[i], &amasks[i][0], + &avalues[i][0])) + return -1; + choice[i] = 0; + } + value = mask = 0; + for (i = 0; i < n_ev; ++i) { + nv = (value | avalues[i][0]) + (value & avalues[i][0] & addf); + if ((((nv + tadd) ^ value) & mask) != 0 || + (((nv + tadd) ^ avalues[i][0]) & amasks[i][0]) != 0) + break; + value = nv; + mask |= amasks[i][0]; + } + if (i == n_ev) + return 0; /* all OK */ + + /* doesn't work, gather alternatives... */ + if (!ppmu->get_alternatives) + return -1; + for (i = 0; i < n_ev; ++i) { + n_alt[i] = ppmu->get_alternatives(event[i], alternatives[i]); + for (j = 1; j < n_alt[i]; ++j) + ppmu->get_constraint(alternatives[i][j], + &amasks[i][j], &avalues[i][j]); + } + + /* enumerate all possibilities and see if any will work */ + i = 0; + j = -1; + value = mask = nv = 0; + while (i < n_ev) { + if (j >= 0) { + /* we're backtracking, restore context */ + value = svalues[i]; + mask = smasks[i]; + j = choice[i]; + } + /* + * See if any alternative k for event i, + * where k > j, will satisfy the constraints. + */ + while (++j < n_alt[i]) { + nv = (value | avalues[i][j]) + + (value & avalues[i][j] & addf); + if ((((nv + tadd) ^ value) & mask) == 0 && + (((nv + tadd) ^ avalues[i][j]) + & amasks[i][j]) == 0) + break; + } + if (j >= n_alt[i]) { + /* + * No feasible alternative, backtrack + * to event i-1 and continue enumerating its + * alternatives from where we got up to. + */ + if (--i < 0) + return -1; + } else { + /* + * Found a feasible alternative for event i, + * remember where we got up to with this event, + * go on to the next event, and start with + * the first alternative for it. + */ + choice[i] = j; + svalues[i] = value; + smasks[i] = mask; + value = nv; + mask |= amasks[i][j]; + ++i; + j = -1; + } + } + + /* OK, we have a feasible combination, tell the caller the solution */ + for (i = 0; i < n_ev; ++i) + event[i] = alternatives[i][choice[i]]; + return 0; +} + +/* + * Check if newly-added counters have consistent settings for + * exclude_{user,kernel,hv} with each other and any previously + * added counters. + */ +static int check_excludes(struct perf_counter **ctrs, int n_prev, int n_new) +{ + int eu, ek, eh; + int i, n; + struct perf_counter *counter; + + n = n_prev + n_new; + if (n <= 1) + return 0; + + eu = ctrs[0]->hw_event.exclude_user; + ek = ctrs[0]->hw_event.exclude_kernel; + eh = ctrs[0]->hw_event.exclude_hv; + if (n_prev == 0) + n_prev = 1; + for (i = n_prev; i < n; ++i) { + counter = ctrs[i]; + if (counter->hw_event.exclude_user != eu || + counter->hw_event.exclude_kernel != ek || + counter->hw_event.exclude_hv != eh) + return -EAGAIN; + } + return 0; +} + +static void power_perf_read(struct perf_counter *counter) +{ + long val, delta, prev; + + if (!counter->hw.idx) + return; + /* + * Performance monitor interrupts come even when interrupts + * are soft-disabled, as long as interrupts are hard-enabled. + * Therefore we treat them like NMIs. + */ + do { + prev = atomic64_read(&counter->hw.prev_count); + barrier(); + val = read_pmc(counter->hw.idx); + } while (atomic64_cmpxchg(&counter->hw.prev_count, prev, val) != prev); + + /* The counters are only 32 bits wide */ + delta = (val - prev) & 0xfffffffful; + atomic64_add(delta, &counter->count); + atomic64_sub(delta, &counter->hw.period_left); +} + +/* + * Disable all counters to prevent PMU interrupts and to allow + * counters to be added or removed. + */ +u64 hw_perf_save_disable(void) +{ + struct cpu_hw_counters *cpuhw; + unsigned long ret; + unsigned long flags; + + local_irq_save(flags); + cpuhw = &__get_cpu_var(cpu_hw_counters); + + ret = cpuhw->disabled; + if (!ret) { + cpuhw->disabled = 1; + cpuhw->n_added = 0; + + /* + * Check if we ever enabled the PMU on this cpu. + */ + if (!cpuhw->pmcs_enabled) { + if (ppc_md.enable_pmcs) + ppc_md.enable_pmcs(); + cpuhw->pmcs_enabled = 1; + } + + /* + * Set the 'freeze counters' bit. + * The barrier is to make sure the mtspr has been + * executed and the PMU has frozen the counters + * before we return. + */ + mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) | MMCR0_FC); + mb(); + } + local_irq_restore(flags); + return ret; +} + +/* + * Re-enable all counters if disable == 0. + * If we were previously disabled and counters were added, then + * put the new config on the PMU. + */ +void hw_perf_restore(u64 disable) +{ + struct perf_counter *counter; + struct cpu_hw_counters *cpuhw; + unsigned long flags; + long i; + unsigned long val; + s64 left; + unsigned int hwc_index[MAX_HWCOUNTERS]; + + if (disable) + return; + local_irq_save(flags); + cpuhw = &__get_cpu_var(cpu_hw_counters); + cpuhw->disabled = 0; + + /* + * If we didn't change anything, or only removed counters, + * no need to recalculate MMCR* settings and reset the PMCs. + * Just reenable the PMU with the current MMCR* settings + * (possibly updated for removal of counters). + */ + if (!cpuhw->n_added) { + mtspr(SPRN_MMCRA, cpuhw->mmcr[2]); + mtspr(SPRN_MMCR1, cpuhw->mmcr[1]); + mtspr(SPRN_MMCR0, cpuhw->mmcr[0]); + if (cpuhw->n_counters == 0) + get_lppaca()->pmcregs_in_use = 0; + goto out; + } + + /* + * Compute MMCR* values for the new set of counters + */ + if (ppmu->compute_mmcr(cpuhw->events, cpuhw->n_counters, hwc_index, + cpuhw->mmcr)) { + /* shouldn't ever get here */ + printk(KERN_ERR "oops compute_mmcr failed\n"); + goto out; + } + + /* + * Add in MMCR0 freeze bits corresponding to the + * hw_event.exclude_* bits for the first counter. + * We have already checked that all counters have the + * same values for these bits as the first counter. + */ + counter = cpuhw->counter[0]; + if (counter->hw_event.exclude_user) + cpuhw->mmcr[0] |= MMCR0_FCP; + if (counter->hw_event.exclude_kernel) + cpuhw->mmcr[0] |= freeze_counters_kernel; + if (counter->hw_event.exclude_hv) + cpuhw->mmcr[0] |= MMCR0_FCHV; + + /* + * Write the new configuration to MMCR* with the freeze + * bit set and set the hardware counters to their initial values. + * Then unfreeze the counters. + */ + get_lppaca()->pmcregs_in_use = 1; + mtspr(SPRN_MMCRA, cpuhw->mmcr[2]); + mtspr(SPRN_MMCR1, cpuhw->mmcr[1]); + mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE)) + | MMCR0_FC); + + /* + * Read off any pre-existing counters that need to move + * to another PMC. + */ + for (i = 0; i < cpuhw->n_counters; ++i) { + counter = cpuhw->counter[i]; + if (counter->hw.idx && counter->hw.idx != hwc_index[i] + 1) { + power_perf_read(counter); + write_pmc(counter->hw.idx, 0); + counter->hw.idx = 0; + } + } + + /* + * Initialize the PMCs for all the new and moved counters. + */ + for (i = 0; i < cpuhw->n_counters; ++i) { + counter = cpuhw->counter[i]; + if (counter->hw.idx) + continue; + val = 0; + if (counter->hw_event.irq_period) { + left = atomic64_read(&counter->hw.period_left); + if (left < 0x80000000L) + val = 0x80000000L - left; + } + atomic64_set(&counter->hw.prev_count, val); + counter->hw.idx = hwc_index[i] + 1; + write_pmc(counter->hw.idx, val); + perf_counter_update_userpage(counter); + } + mb(); + cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE; + mtspr(SPRN_MMCR0, cpuhw->mmcr[0]); + + out: + local_irq_restore(flags); +} + +static int collect_events(struct perf_counter *group, int max_count, + struct perf_counter *ctrs[], unsigned int *events) +{ + int n = 0; + struct perf_counter *counter; + + if (!is_software_counter(group)) { + if (n >= max_count) + return -1; + ctrs[n] = group; + events[n++] = group->hw.config; + } + list_for_each_entry(counter, &group->sibling_list, list_entry) { + if (!is_software_counter(counter) && + counter->state != PERF_COUNTER_STATE_OFF) { + if (n >= max_count) + return -1; + ctrs[n] = counter; + events[n++] = counter->hw.config; + } + } + return n; +} + +static void counter_sched_in(struct perf_counter *counter, int cpu) +{ + counter->state = PERF_COUNTER_STATE_ACTIVE; + counter->oncpu = cpu; + counter->tstamp_running += counter->ctx->time_now - + counter->tstamp_stopped; + if (is_software_counter(counter)) + counter->hw_ops->enable(counter); +} + +/* + * Called to enable a whole group of counters. + * Returns 1 if the group was enabled, or -EAGAIN if it could not be. + * Assumes the caller has disabled interrupts and has + * frozen the PMU with hw_perf_save_disable. + */ +int hw_perf_group_sched_in(struct perf_counter *group_leader, + struct perf_cpu_context *cpuctx, + struct perf_counter_context *ctx, int cpu) +{ + struct cpu_hw_counters *cpuhw; + long i, n, n0; + struct perf_counter *sub; + + cpuhw = &__get_cpu_var(cpu_hw_counters); + n0 = cpuhw->n_counters; + n = collect_events(group_leader, ppmu->n_counter - n0, + &cpuhw->counter[n0], &cpuhw->events[n0]); + if (n < 0) + return -EAGAIN; + if (check_excludes(cpuhw->counter, n0, n)) + return -EAGAIN; + if (power_check_constraints(cpuhw->events, n + n0)) + return -EAGAIN; + cpuhw->n_counters = n0 + n; + cpuhw->n_added += n; + + /* + * OK, this group can go on; update counter states etc., + * and enable any software counters + */ + for (i = n0; i < n0 + n; ++i) + cpuhw->counter[i]->hw.config = cpuhw->events[i]; + cpuctx->active_oncpu += n; + n = 1; + counter_sched_in(group_leader, cpu); + list_for_each_entry(sub, &group_leader->sibling_list, list_entry) { + if (sub->state != PERF_COUNTER_STATE_OFF) { + counter_sched_in(sub, cpu); + ++n; + } + } + ctx->nr_active += n; + + return 1; +} + +/* + * Add a counter to the PMU. + * If all counters are not already frozen, then we disable and + * re-enable the PMU in order to get hw_perf_restore to do the + * actual work of reconfiguring the PMU. + */ +static int power_perf_enable(struct perf_counter *counter) +{ + struct cpu_hw_counters *cpuhw; + unsigned long flags; + u64 pmudis; + int n0; + int ret = -EAGAIN; + + local_irq_save(flags); + pmudis = hw_perf_save_disable(); + + /* + * Add the counter to the list (if there is room) + * and check whether the total set is still feasible. + */ + cpuhw = &__get_cpu_var(cpu_hw_counters); + n0 = cpuhw->n_counters; + if (n0 >= ppmu->n_counter) + goto out; + cpuhw->counter[n0] = counter; + cpuhw->events[n0] = counter->hw.config; + if (check_excludes(cpuhw->counter, n0, 1)) + goto out; + if (power_check_constraints(cpuhw->events, n0 + 1)) + goto out; + + counter->hw.config = cpuhw->events[n0]; + ++cpuhw->n_counters; + ++cpuhw->n_added; + + ret = 0; + out: + hw_perf_restore(pmudis); + local_irq_restore(flags); + return ret; +} + +/* + * Remove a counter from the PMU. + */ +static void power_perf_disable(struct perf_counter *counter) +{ + struct cpu_hw_counters *cpuhw; + long i; + u64 pmudis; + unsigned long flags; + + local_irq_save(flags); + pmudis = hw_perf_save_disable(); + + power_perf_read(counter); + + cpuhw = &__get_cpu_var(cpu_hw_counters); + for (i = 0; i < cpuhw->n_counters; ++i) { + if (counter == cpuhw->counter[i]) { + while (++i < cpuhw->n_counters) + cpuhw->counter[i-1] = cpuhw->counter[i]; + --cpuhw->n_counters; + ppmu->disable_pmc(counter->hw.idx - 1, cpuhw->mmcr); + write_pmc(counter->hw.idx, 0); + counter->hw.idx = 0; + perf_counter_update_userpage(counter); + break; + } + } + if (cpuhw->n_counters == 0) { + /* disable exceptions if no counters are running */ + cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE); + } + + hw_perf_restore(pmudis); + local_irq_restore(flags); +} + +struct hw_perf_counter_ops power_perf_ops = { + .enable = power_perf_enable, + .disable = power_perf_disable, + .read = power_perf_read +}; + +const struct hw_perf_counter_ops * +hw_perf_counter_init(struct perf_counter *counter) +{ + unsigned long ev; + struct perf_counter *ctrs[MAX_HWCOUNTERS]; + unsigned int events[MAX_HWCOUNTERS]; + int n; + + if (!ppmu) + return NULL; + if ((s64)counter->hw_event.irq_period < 0) + return NULL; + if (!perf_event_raw(&counter->hw_event)) { + ev = perf_event_id(&counter->hw_event); + if (ev >= ppmu->n_generic || ppmu->generic_events[ev] == 0) + return NULL; + ev = ppmu->generic_events[ev]; + } else { + ev = perf_event_config(&counter->hw_event); + } + counter->hw.config_base = ev; + counter->hw.idx = 0; + + /* + * If we are not running on a hypervisor, force the + * exclude_hv bit to 0 so that we don't care what + * the user set it to. + */ + if (!firmware_has_feature(FW_FEATURE_LPAR)) + counter->hw_event.exclude_hv = 0; + + /* + * If this is in a group, check if it can go on with all the + * other hardware counters in the group. We assume the counter + * hasn't been linked into its leader's sibling list at this point. + */ + n = 0; + if (counter->group_leader != counter) { + n = collect_events(counter->group_leader, ppmu->n_counter - 1, + ctrs, events); + if (n < 0) + return NULL; + } + events[n] = ev; + ctrs[n] = counter; + if (check_excludes(ctrs, n, 1)) + return NULL; + if (power_check_constraints(events, n + 1)) + return NULL; + + counter->hw.config = events[n]; + atomic64_set(&counter->hw.period_left, counter->hw_event.irq_period); + return &power_perf_ops; +} + +/* + * Handle wakeups. + */ +void perf_counter_do_pending(void) +{ + int i; + struct cpu_hw_counters *cpuhw = &__get_cpu_var(cpu_hw_counters); + struct perf_counter *counter; + + for (i = 0; i < cpuhw->n_counters; ++i) { + counter = cpuhw->counter[i]; + if (counter && counter->wakeup_pending) { + counter->wakeup_pending = 0; + wake_up(&counter->waitq); + } + } +} + +/* + * A counter has overflowed; update its count and record + * things if requested. Note that interrupts are hard-disabled + * here so there is no possibility of being interrupted. + */ +static void record_and_restart(struct perf_counter *counter, long val, + struct pt_regs *regs) +{ + s64 prev, delta, left; + int record = 0; + + /* we don't have to worry about interrupts here */ + prev = atomic64_read(&counter->hw.prev_count); + delta = (val - prev) & 0xfffffffful; + atomic64_add(delta, &counter->count); + + /* + * See if the total period for this counter has expired, + * and update for the next period. + */ + val = 0; + left = atomic64_read(&counter->hw.period_left) - delta; + if (counter->hw_event.irq_period) { + if (left <= 0) { + left += counter->hw_event.irq_period; + if (left <= 0) + left = counter->hw_event.irq_period; + record = 1; + } + if (left < 0x80000000L) + val = 0x80000000L - left; + } + write_pmc(counter->hw.idx, val); + atomic64_set(&counter->hw.prev_count, val); + atomic64_set(&counter->hw.period_left, left); + perf_counter_update_userpage(counter); + + /* + * Finally record data if requested. + */ + if (record) + perf_counter_output(counter, 1, regs); +} + +/* + * Performance monitor interrupt stuff + */ +static void perf_counter_interrupt(struct pt_regs *regs) +{ + int i; + struct cpu_hw_counters *cpuhw = &__get_cpu_var(cpu_hw_counters); + struct perf_counter *counter; + long val; + int need_wakeup = 0, found = 0; + + for (i = 0; i < cpuhw->n_counters; ++i) { + counter = cpuhw->counter[i]; + val = read_pmc(counter->hw.idx); + if ((int)val < 0) { + /* counter has overflowed */ + found = 1; + record_and_restart(counter, val, regs); + } + } + + /* + * In case we didn't find and reset the counter that caused + * the interrupt, scan all counters and reset any that are + * negative, to avoid getting continual interrupts. + * Any that we processed in the previous loop will not be negative. + */ + if (!found) { + for (i = 0; i < ppmu->n_counter; ++i) { + val = read_pmc(i + 1); + if ((int)val < 0) + write_pmc(i + 1, 0); + } + } + + /* + * Reset MMCR0 to its normal value. This will set PMXE and + * clear FC (freeze counters) and PMAO (perf mon alert occurred) + * and thus allow interrupts to occur again. + * XXX might want to use MSR.PM to keep the counters frozen until + * we get back out of this interrupt. + */ + mtspr(SPRN_MMCR0, cpuhw->mmcr[0]); + + /* + * If we need a wakeup, check whether interrupts were soft-enabled + * when we took the interrupt. If they were, we can wake stuff up + * immediately; otherwise we'll have do the wakeup when interrupts + * get soft-enabled. + */ + if (get_perf_counter_pending() && regs->softe) { + irq_enter(); + clear_perf_counter_pending(); + perf_counter_do_pending(); + irq_exit(); + } +} + +void hw_perf_counter_setup(int cpu) +{ + struct cpu_hw_counters *cpuhw = &per_cpu(cpu_hw_counters, cpu); + + memset(cpuhw, 0, sizeof(*cpuhw)); + cpuhw->mmcr[0] = MMCR0_FC; +} + +extern struct power_pmu power4_pmu; +extern struct power_pmu ppc970_pmu; +extern struct power_pmu power5_pmu; +extern struct power_pmu power5p_pmu; +extern struct power_pmu power6_pmu; + +static int init_perf_counters(void) +{ + unsigned long pvr; + + if (reserve_pmc_hardware(perf_counter_interrupt)) { + printk(KERN_ERR "Couldn't init performance monitor subsystem\n"); + return -EBUSY; + } + + /* XXX should get this from cputable */ + pvr = mfspr(SPRN_PVR); + switch (PVR_VER(pvr)) { + case PV_POWER4: + case PV_POWER4p: + ppmu = &power4_pmu; + break; + case PV_970: + case PV_970FX: + case PV_970MP: + ppmu = &ppc970_pmu; + break; + case PV_POWER5: + ppmu = &power5_pmu; + break; + case PV_POWER5p: + ppmu = &power5p_pmu; + break; + case 0x3e: + ppmu = &power6_pmu; + break; + } + + /* + * Use FCHV to ignore kernel events if MSR.HV is set. + */ + if (mfmsr() & MSR_HV) + freeze_counters_kernel = MMCR0_FCHV; + + return 0; +} + +arch_initcall(init_perf_counters); Index: linux-2.6-tip/arch/powerpc/kernel/power4-pmu.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/power4-pmu.c @@ -0,0 +1,557 @@ +/* + * Performance counter support for POWER4 (GP) and POWER4+ (GQ) processors. + * + * Copyright 2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +/* + * Bits in event code for POWER4 + */ +#define PM_PMC_SH 12 /* PMC number (1-based) for direct events */ +#define PM_PMC_MSK 0xf +#define PM_UNIT_SH 8 /* TTMMUX number and setting - unit select */ +#define PM_UNIT_MSK 0xf +#define PM_LOWER_SH 6 +#define PM_LOWER_MSK 1 +#define PM_LOWER_MSKS 0x40 +#define PM_BYTE_SH 4 /* Byte number of event bus to use */ +#define PM_BYTE_MSK 3 +#define PM_PMCSEL_MSK 7 + +/* + * Unit code values + */ +#define PM_FPU 1 +#define PM_ISU1 2 +#define PM_IFU 3 +#define PM_IDU0 4 +#define PM_ISU1_ALT 6 +#define PM_ISU2 7 +#define PM_IFU_ALT 8 +#define PM_LSU0 9 +#define PM_LSU1 0xc +#define PM_GPS 0xf + +/* + * Bits in MMCR0 for POWER4 + */ +#define MMCR0_PMC1SEL_SH 8 +#define MMCR0_PMC2SEL_SH 1 +#define MMCR_PMCSEL_MSK 0x1f + +/* + * Bits in MMCR1 for POWER4 + */ +#define MMCR1_TTM0SEL_SH 62 +#define MMCR1_TTC0SEL_SH 61 +#define MMCR1_TTM1SEL_SH 59 +#define MMCR1_TTC1SEL_SH 58 +#define MMCR1_TTM2SEL_SH 56 +#define MMCR1_TTC2SEL_SH 55 +#define MMCR1_TTM3SEL_SH 53 +#define MMCR1_TTC3SEL_SH 52 +#define MMCR1_TTMSEL_MSK 3 +#define MMCR1_TD_CP_DBG0SEL_SH 50 +#define MMCR1_TD_CP_DBG1SEL_SH 48 +#define MMCR1_TD_CP_DBG2SEL_SH 46 +#define MMCR1_TD_CP_DBG3SEL_SH 44 +#define MMCR1_DEBUG0SEL_SH 43 +#define MMCR1_DEBUG1SEL_SH 42 +#define MMCR1_DEBUG2SEL_SH 41 +#define MMCR1_DEBUG3SEL_SH 40 +#define MMCR1_PMC1_ADDER_SEL_SH 39 +#define MMCR1_PMC2_ADDER_SEL_SH 38 +#define MMCR1_PMC6_ADDER_SEL_SH 37 +#define MMCR1_PMC5_ADDER_SEL_SH 36 +#define MMCR1_PMC8_ADDER_SEL_SH 35 +#define MMCR1_PMC7_ADDER_SEL_SH 34 +#define MMCR1_PMC3_ADDER_SEL_SH 33 +#define MMCR1_PMC4_ADDER_SEL_SH 32 +#define MMCR1_PMC3SEL_SH 27 +#define MMCR1_PMC4SEL_SH 22 +#define MMCR1_PMC5SEL_SH 17 +#define MMCR1_PMC6SEL_SH 12 +#define MMCR1_PMC7SEL_SH 7 +#define MMCR1_PMC8SEL_SH 2 /* note bit 0 is in MMCRA for GP */ + +static short mmcr1_adder_bits[8] = { + MMCR1_PMC1_ADDER_SEL_SH, + MMCR1_PMC2_ADDER_SEL_SH, + MMCR1_PMC3_ADDER_SEL_SH, + MMCR1_PMC4_ADDER_SEL_SH, + MMCR1_PMC5_ADDER_SEL_SH, + MMCR1_PMC6_ADDER_SEL_SH, + MMCR1_PMC7_ADDER_SEL_SH, + MMCR1_PMC8_ADDER_SEL_SH +}; + +/* + * Bits in MMCRA + */ +#define MMCRA_PMC8SEL0_SH 17 /* PMC8SEL bit 0 for GP */ + +/* + * Layout of constraint bits: + * 6666555555555544444444443333333333222222222211111111110000000000 + * 3210987654321098765432109876543210987654321098765432109876543210 + * |[ >[ >[ >|||[ >[ >< >< >< >< ><><><><><><><><> + * | UC1 UC2 UC3 ||| PS1 PS2 B0 B1 B2 B3 P1P2P3P4P5P6P7P8 + * \SMPL ||\TTC3SEL + * |\TTC_IFU_SEL + * \TTM2SEL0 + * + * SMPL - SAMPLE_ENABLE constraint + * 56: SAMPLE_ENABLE value 0x0100_0000_0000_0000 + * + * UC1 - unit constraint 1: can't have all three of FPU/ISU1/IDU0|ISU2 + * 55: UC1 error 0x0080_0000_0000_0000 + * 54: FPU events needed 0x0040_0000_0000_0000 + * 53: ISU1 events needed 0x0020_0000_0000_0000 + * 52: IDU0|ISU2 events needed 0x0010_0000_0000_0000 + * + * UC2 - unit constraint 2: can't have all three of FPU/IFU/LSU0 + * 51: UC2 error 0x0008_0000_0000_0000 + * 50: FPU events needed 0x0004_0000_0000_0000 + * 49: IFU events needed 0x0002_0000_0000_0000 + * 48: LSU0 events needed 0x0001_0000_0000_0000 + * + * UC3 - unit constraint 3: can't have all four of LSU0/IFU/IDU0|ISU2/ISU1 + * 47: UC3 error 0x8000_0000_0000 + * 46: LSU0 events needed 0x4000_0000_0000 + * 45: IFU events needed 0x2000_0000_0000 + * 44: IDU0|ISU2 events needed 0x1000_0000_0000 + * 43: ISU1 events needed 0x0800_0000_0000 + * + * TTM2SEL0 + * 42: 0 = IDU0 events needed + * 1 = ISU2 events needed 0x0400_0000_0000 + * + * TTC_IFU_SEL + * 41: 0 = IFU.U events needed + * 1 = IFU.L events needed 0x0200_0000_0000 + * + * TTC3SEL + * 40: 0 = LSU1.U events needed + * 1 = LSU1.L events needed 0x0100_0000_0000 + * + * PS1 + * 39: PS1 error 0x0080_0000_0000 + * 36-38: count of events needing PMC1/2/5/6 0x0070_0000_0000 + * + * PS2 + * 35: PS2 error 0x0008_0000_0000 + * 32-34: count of events needing PMC3/4/7/8 0x0007_0000_0000 + * + * B0 + * 28-31: Byte 0 event source 0xf000_0000 + * 1 = FPU + * 2 = ISU1 + * 3 = IFU + * 4 = IDU0 + * 7 = ISU2 + * 9 = LSU0 + * c = LSU1 + * f = GPS + * + * B1, B2, B3 + * 24-27, 20-23, 16-19: Byte 1, 2, 3 event sources + * + * P8 + * 15: P8 error 0x8000 + * 14-15: Count of events needing PMC8 + * + * P1..P7 + * 0-13: Count of events needing PMC1..PMC7 + * + * Note: this doesn't allow events using IFU.U to be combined with events + * using IFU.L, though that is feasible (using TTM0 and TTM2). However + * there are no listed events for IFU.L (they are debug events not + * verified for performance monitoring) so this shouldn't cause a + * problem. + */ + +static struct unitinfo { + u64 value, mask; + int unit; + int lowerbit; +} p4_unitinfo[16] = { + [PM_FPU] = { 0x44000000000000ull, 0x88000000000000ull, PM_FPU, 0 }, + [PM_ISU1] = { 0x20080000000000ull, 0x88000000000000ull, PM_ISU1, 0 }, + [PM_ISU1_ALT] = + { 0x20080000000000ull, 0x88000000000000ull, PM_ISU1, 0 }, + [PM_IFU] = { 0x02200000000000ull, 0x08820000000000ull, PM_IFU, 41 }, + [PM_IFU_ALT] = + { 0x02200000000000ull, 0x08820000000000ull, PM_IFU, 41 }, + [PM_IDU0] = { 0x10100000000000ull, 0x80840000000000ull, PM_IDU0, 1 }, + [PM_ISU2] = { 0x10140000000000ull, 0x80840000000000ull, PM_ISU2, 0 }, + [PM_LSU0] = { 0x01400000000000ull, 0x08800000000000ull, PM_LSU0, 0 }, + [PM_LSU1] = { 0x00000000000000ull, 0x00010000000000ull, PM_LSU1, 40 }, + [PM_GPS] = { 0x00000000000000ull, 0x00000000000000ull, PM_GPS, 0 } +}; + +static unsigned char direct_marked_event[8] = { + (1<<2) | (1<<3), /* PMC1: PM_MRK_GRP_DISP, PM_MRK_ST_CMPL */ + (1<<3) | (1<<5), /* PMC2: PM_THRESH_TIMEO, PM_MRK_BRU_FIN */ + (1<<3), /* PMC3: PM_MRK_ST_CMPL_INT */ + (1<<4) | (1<<5), /* PMC4: PM_MRK_GRP_CMPL, PM_MRK_CRU_FIN */ + (1<<4) | (1<<5), /* PMC5: PM_MRK_GRP_TIMEO */ + (1<<3) | (1<<4) | (1<<5), + /* PMC6: PM_MRK_ST_GPS, PM_MRK_FXU_FIN, PM_MRK_GRP_ISSUED */ + (1<<4) | (1<<5), /* PMC7: PM_MRK_FPU_FIN, PM_MRK_INST_FIN */ + (1<<4), /* PMC8: PM_MRK_LSU_FIN */ +}; + +/* + * Returns 1 if event counts things relating to marked instructions + * and thus needs the MMCRA_SAMPLE_ENABLE bit set, or 0 if not. + */ +static int p4_marked_instr_event(unsigned int event) +{ + int pmc, psel, unit, byte, bit; + unsigned int mask; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + psel = event & PM_PMCSEL_MSK; + if (pmc) { + if (direct_marked_event[pmc - 1] & (1 << psel)) + return 1; + if (psel == 0) /* add events */ + bit = (pmc <= 4)? pmc - 1: 8 - pmc; + else if (psel == 6) /* decode events */ + bit = 4; + else + return 0; + } else + bit = psel; + + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + unit = (event >> PM_UNIT_SH) & PM_UNIT_MSK; + mask = 0; + switch (unit) { + case PM_LSU1: + if (event & PM_LOWER_MSKS) + mask = 1 << 28; /* byte 7 bit 4 */ + else + mask = 6 << 24; /* byte 3 bits 1 and 2 */ + break; + case PM_LSU0: + /* byte 3, bit 3; byte 2 bits 0,2,3,4,5; byte 1 */ + mask = 0x083dff00; + } + return (mask >> (byte * 8 + bit)) & 1; +} + +static int p4_get_constraint(unsigned int event, u64 *maskp, u64 *valp) +{ + int pmc, byte, unit, lower, sh; + u64 mask = 0, value = 0; + int grp = -1; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 8) + return -1; + sh = (pmc - 1) * 2; + mask |= 2 << sh; + value |= 1 << sh; + grp = ((pmc - 1) >> 1) & 1; + } + unit = (event >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + if (unit) { + lower = (event >> PM_LOWER_SH) & PM_LOWER_MSK; + + /* + * Bus events on bytes 0 and 2 can be counted + * on PMC1/2/5/6; bytes 1 and 3 on PMC3/4/7/8. + */ + if (!pmc) + grp = byte & 1; + + if (!p4_unitinfo[unit].unit) + return -1; + mask |= p4_unitinfo[unit].mask; + value |= p4_unitinfo[unit].value; + sh = p4_unitinfo[unit].lowerbit; + if (sh > 1) + value |= (u64)lower << sh; + else if (lower != sh) + return -1; + unit = p4_unitinfo[unit].unit; + + /* Set byte lane select field */ + mask |= 0xfULL << (28 - 4 * byte); + value |= (u64)unit << (28 - 4 * byte); + } + if (grp == 0) { + /* increment PMC1/2/5/6 field */ + mask |= 0x8000000000ull; + value |= 0x1000000000ull; + } else { + /* increment PMC3/4/7/8 field */ + mask |= 0x800000000ull; + value |= 0x100000000ull; + } + + /* Marked instruction events need sample_enable set */ + if (p4_marked_instr_event(event)) { + mask |= 1ull << 56; + value |= 1ull << 56; + } + + /* PMCSEL=6 decode events on byte 2 need sample_enable clear */ + if (pmc && (event & PM_PMCSEL_MSK) == 6 && byte == 2) + mask |= 1ull << 56; + + *maskp = mask; + *valp = value; + return 0; +} + +static unsigned int ppc_inst_cmpl[] = { + 0x1001, 0x4001, 0x6001, 0x7001, 0x8001 +}; + +static int p4_get_alternatives(unsigned int event, unsigned int alt[]) +{ + int i, j, na; + + alt[0] = event; + na = 1; + + /* 2 possibilities for PM_GRP_DISP_REJECT */ + if (event == 0x8003 || event == 0x0224) { + alt[1] = event ^ (0x8003 ^ 0x0224); + return 2; + } + + /* 2 possibilities for PM_ST_MISS_L1 */ + if (event == 0x0c13 || event == 0x0c23) { + alt[1] = event ^ (0x0c13 ^ 0x0c23); + return 2; + } + + /* several possibilities for PM_INST_CMPL */ + for (i = 0; i < ARRAY_SIZE(ppc_inst_cmpl); ++i) { + if (event == ppc_inst_cmpl[i]) { + for (j = 0; j < ARRAY_SIZE(ppc_inst_cmpl); ++j) + if (j != i) + alt[na++] = ppc_inst_cmpl[j]; + break; + } + } + + return na; +} + +static int p4_compute_mmcr(unsigned int event[], int n_ev, + unsigned int hwc[], u64 mmcr[]) +{ + u64 mmcr0 = 0, mmcr1 = 0, mmcra = 0; + unsigned int pmc, unit, byte, psel, lower; + unsigned int ttm, grp; + unsigned int pmc_inuse = 0; + unsigned int pmc_grp_use[2]; + unsigned char busbyte[4]; + unsigned char unituse[16]; + unsigned int unitlower = 0; + int i; + + if (n_ev > 8) + return -1; + + /* First pass to count resource use */ + pmc_grp_use[0] = pmc_grp_use[1] = 0; + memset(busbyte, 0, sizeof(busbyte)); + memset(unituse, 0, sizeof(unituse)); + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc_inuse & (1 << (pmc - 1))) + return -1; + pmc_inuse |= 1 << (pmc - 1); + /* count 1/2/5/6 vs 3/4/7/8 use */ + ++pmc_grp_use[((pmc - 1) >> 1) & 1]; + } + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + lower = (event[i] >> PM_LOWER_SH) & PM_LOWER_MSK; + if (unit) { + if (!pmc) + ++pmc_grp_use[byte & 1]; + if (unit == 6 || unit == 8) + /* map alt ISU1/IFU codes: 6->2, 8->3 */ + unit = (unit >> 1) - 1; + if (busbyte[byte] && busbyte[byte] != unit) + return -1; + busbyte[byte] = unit; + lower <<= unit; + if (unituse[unit] && lower != (unitlower & lower)) + return -1; + unituse[unit] = 1; + unitlower |= lower; + } + } + if (pmc_grp_use[0] > 4 || pmc_grp_use[1] > 4) + return -1; + + /* + * Assign resources and set multiplexer selects. + * + * Units 1,2,3 are on TTM0, 4,6,7 on TTM1, 8,10 on TTM2. + * Each TTMx can only select one unit, but since + * units 2 and 6 are both ISU1, and 3 and 8 are both IFU, + * we have some choices. + */ + if (unituse[2] & (unituse[1] | (unituse[3] & unituse[9]))) { + unituse[6] = 1; /* Move 2 to 6 */ + unituse[2] = 0; + } + if (unituse[3] & (unituse[1] | unituse[2])) { + unituse[8] = 1; /* Move 3 to 8 */ + unituse[3] = 0; + unitlower = (unitlower & ~8) | ((unitlower & 8) << 5); + } + /* Check only one unit per TTMx */ + if (unituse[1] + unituse[2] + unituse[3] > 1 || + unituse[4] + unituse[6] + unituse[7] > 1 || + unituse[8] + unituse[9] > 1 || + (unituse[5] | unituse[10] | unituse[11] | + unituse[13] | unituse[14])) + return -1; + + /* Set TTMxSEL fields. Note, units 1-3 => TTM0SEL codes 0-2 */ + mmcr1 |= (u64)(unituse[3] * 2 + unituse[2]) << MMCR1_TTM0SEL_SH; + mmcr1 |= (u64)(unituse[7] * 3 + unituse[6] * 2) << MMCR1_TTM1SEL_SH; + mmcr1 |= (u64)unituse[9] << MMCR1_TTM2SEL_SH; + + /* Set TTCxSEL fields. */ + if (unitlower & 0xe) + mmcr1 |= 1ull << MMCR1_TTC0SEL_SH; + if (unitlower & 0xf0) + mmcr1 |= 1ull << MMCR1_TTC1SEL_SH; + if (unitlower & 0xf00) + mmcr1 |= 1ull << MMCR1_TTC2SEL_SH; + if (unitlower & 0x7000) + mmcr1 |= 1ull << MMCR1_TTC3SEL_SH; + + /* Set byte lane select fields. */ + for (byte = 0; byte < 4; ++byte) { + unit = busbyte[byte]; + if (!unit) + continue; + if (unit == 0xf) { + /* special case for GPS */ + mmcr1 |= 1ull << (MMCR1_DEBUG0SEL_SH - byte); + } else { + if (!unituse[unit]) + ttm = unit - 1; /* 2->1, 3->2 */ + else + ttm = unit >> 2; + mmcr1 |= (u64)ttm << (MMCR1_TD_CP_DBG0SEL_SH - 2*byte); + } + } + + /* Second pass: assign PMCs, set PMCxSEL and PMCx_ADDER_SEL fields */ + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + psel = event[i] & PM_PMCSEL_MSK; + if (!pmc) { + /* Bus event or 00xxx direct event (off or cycles) */ + if (unit) + psel |= 0x10 | ((byte & 2) << 2); + for (pmc = 0; pmc < 8; ++pmc) { + if (pmc_inuse & (1 << pmc)) + continue; + grp = (pmc >> 1) & 1; + if (unit) { + if (grp == (byte & 1)) + break; + } else if (pmc_grp_use[grp] < 4) { + ++pmc_grp_use[grp]; + break; + } + } + pmc_inuse |= 1 << pmc; + } else { + /* Direct event */ + --pmc; + if (psel == 0 && (byte & 2)) + /* add events on higher-numbered bus */ + mmcr1 |= 1ull << mmcr1_adder_bits[pmc]; + else if (psel == 6 && byte == 3) + /* seem to need to set sample_enable here */ + mmcra |= MMCRA_SAMPLE_ENABLE; + psel |= 8; + } + if (pmc <= 1) + mmcr0 |= psel << (MMCR0_PMC1SEL_SH - 7 * pmc); + else + mmcr1 |= psel << (MMCR1_PMC3SEL_SH - 5 * (pmc - 2)); + if (pmc == 7) /* PMC8 */ + mmcra |= (psel & 1) << MMCRA_PMC8SEL0_SH; + hwc[i] = pmc; + if (p4_marked_instr_event(event[i])) + mmcra |= MMCRA_SAMPLE_ENABLE; + } + + if (pmc_inuse & 1) + mmcr0 |= MMCR0_PMC1CE; + if (pmc_inuse & 0xfe) + mmcr0 |= MMCR0_PMCjCE; + + mmcra |= 0x2000; /* mark only one IOP per PPC instruction */ + + /* Return MMCRx values */ + mmcr[0] = mmcr0; + mmcr[1] = mmcr1; + mmcr[2] = mmcra; + return 0; +} + +static void p4_disable_pmc(unsigned int pmc, u64 mmcr[]) +{ + /* + * Setting the PMCxSEL field to 0 disables PMC x. + * (Note that pmc is 0-based here, not 1-based.) + */ + if (pmc <= 1) { + mmcr[0] &= ~(0x1fUL << (MMCR0_PMC1SEL_SH - 7 * pmc)); + } else { + mmcr[1] &= ~(0x1fUL << (MMCR1_PMC3SEL_SH - 5 * (pmc - 2))); + if (pmc == 7) + mmcr[2] &= ~(1UL << MMCRA_PMC8SEL0_SH); + } +} + +static int p4_generic_events[] = { + [PERF_COUNT_CPU_CYCLES] = 7, + [PERF_COUNT_INSTRUCTIONS] = 0x1001, + [PERF_COUNT_CACHE_REFERENCES] = 0x8c10, /* PM_LD_REF_L1 */ + [PERF_COUNT_CACHE_MISSES] = 0x3c10, /* PM_LD_MISS_L1 */ + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x330, /* PM_BR_ISSUED */ + [PERF_COUNT_BRANCH_MISSES] = 0x331, /* PM_BR_MPRED_CR */ +}; + +struct power_pmu power4_pmu = { + .n_counter = 8, + .max_alternatives = 5, + .add_fields = 0x0000001100005555ull, + .test_adder = 0x0011083300000000ull, + .compute_mmcr = p4_compute_mmcr, + .get_constraint = p4_get_constraint, + .get_alternatives = p4_get_alternatives, + .disable_pmc = p4_disable_pmc, + .n_generic = ARRAY_SIZE(p4_generic_events), + .generic_events = p4_generic_events, +}; Index: linux-2.6-tip/arch/powerpc/kernel/power5+-pmu.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/power5+-pmu.c @@ -0,0 +1,452 @@ +/* + * Performance counter support for POWER5 (not POWER5++) processors. + * + * Copyright 2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +/* + * Bits in event code for POWER5+ (POWER5 GS) and POWER5++ (POWER5 GS DD3) + */ +#define PM_PMC_SH 20 /* PMC number (1-based) for direct events */ +#define PM_PMC_MSK 0xf +#define PM_PMC_MSKS (PM_PMC_MSK << PM_PMC_SH) +#define PM_UNIT_SH 16 /* TTMMUX number and setting - unit select */ +#define PM_UNIT_MSK 0xf +#define PM_BYTE_SH 12 /* Byte number of event bus to use */ +#define PM_BYTE_MSK 7 +#define PM_GRS_SH 8 /* Storage subsystem mux select */ +#define PM_GRS_MSK 7 +#define PM_BUSEVENT_MSK 0x80 /* Set if event uses event bus */ +#define PM_PMCSEL_MSK 0x7f + +/* Values in PM_UNIT field */ +#define PM_FPU 0 +#define PM_ISU0 1 +#define PM_IFU 2 +#define PM_ISU1 3 +#define PM_IDU 4 +#define PM_ISU0_ALT 6 +#define PM_GRS 7 +#define PM_LSU0 8 +#define PM_LSU1 0xc +#define PM_LASTUNIT 0xc + +/* + * Bits in MMCR1 for POWER5+ + */ +#define MMCR1_TTM0SEL_SH 62 +#define MMCR1_TTM1SEL_SH 60 +#define MMCR1_TTM2SEL_SH 58 +#define MMCR1_TTM3SEL_SH 56 +#define MMCR1_TTMSEL_MSK 3 +#define MMCR1_TD_CP_DBG0SEL_SH 54 +#define MMCR1_TD_CP_DBG1SEL_SH 52 +#define MMCR1_TD_CP_DBG2SEL_SH 50 +#define MMCR1_TD_CP_DBG3SEL_SH 48 +#define MMCR1_GRS_L2SEL_SH 46 +#define MMCR1_GRS_L2SEL_MSK 3 +#define MMCR1_GRS_L3SEL_SH 44 +#define MMCR1_GRS_L3SEL_MSK 3 +#define MMCR1_GRS_MCSEL_SH 41 +#define MMCR1_GRS_MCSEL_MSK 7 +#define MMCR1_GRS_FABSEL_SH 39 +#define MMCR1_GRS_FABSEL_MSK 3 +#define MMCR1_PMC1_ADDER_SEL_SH 35 +#define MMCR1_PMC2_ADDER_SEL_SH 34 +#define MMCR1_PMC3_ADDER_SEL_SH 33 +#define MMCR1_PMC4_ADDER_SEL_SH 32 +#define MMCR1_PMC1SEL_SH 25 +#define MMCR1_PMC2SEL_SH 17 +#define MMCR1_PMC3SEL_SH 9 +#define MMCR1_PMC4SEL_SH 1 +#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) +#define MMCR1_PMCSEL_MSK 0x7f + +/* + * Bits in MMCRA + */ + +/* + * Layout of constraint bits: + * 6666555555555544444444443333333333222222222211111111110000000000 + * 3210987654321098765432109876543210987654321098765432109876543210 + * [ ><><>< ><> <><>[ > < >< >< >< ><><><><> + * NC G0G1G2 G3 T0T1 UC B0 B1 B2 B3 P4P3P2P1 + * + * NC - number of counters + * 51: NC error 0x0008_0000_0000_0000 + * 48-50: number of events needing PMC1-4 0x0007_0000_0000_0000 + * + * G0..G3 - GRS mux constraints + * 46-47: GRS_L2SEL value + * 44-45: GRS_L3SEL value + * 41-44: GRS_MCSEL value + * 39-40: GRS_FABSEL value + * Note that these match up with their bit positions in MMCR1 + * + * T0 - TTM0 constraint + * 36-37: TTM0SEL value (0=FPU, 2=IFU, 3=ISU1) 0x30_0000_0000 + * + * T1 - TTM1 constraint + * 34-35: TTM1SEL value (0=IDU, 3=GRS) 0x0c_0000_0000 + * + * UC - unit constraint: can't have all three of FPU|IFU|ISU1, ISU0, IDU|GRS + * 33: UC3 error 0x02_0000_0000 + * 32: FPU|IFU|ISU1 events needed 0x01_0000_0000 + * 31: ISU0 events needed 0x01_8000_0000 + * 30: IDU|GRS events needed 0x00_4000_0000 + * + * B0 + * 20-23: Byte 0 event source 0x00f0_0000 + * Encoding as for the event code + * + * B1, B2, B3 + * 16-19, 12-15, 8-11: Byte 1, 2, 3 event sources + * + * P4 + * 7: P1 error 0x80 + * 6-7: Count of events needing PMC4 + * + * P1..P3 + * 0-6: Count of events needing PMC1..PMC3 + */ + +static const int grsel_shift[8] = { + MMCR1_GRS_L2SEL_SH, MMCR1_GRS_L2SEL_SH, MMCR1_GRS_L2SEL_SH, + MMCR1_GRS_L3SEL_SH, MMCR1_GRS_L3SEL_SH, MMCR1_GRS_L3SEL_SH, + MMCR1_GRS_MCSEL_SH, MMCR1_GRS_FABSEL_SH +}; + +/* Masks and values for using events from the various units */ +static u64 unit_cons[PM_LASTUNIT+1][2] = { + [PM_FPU] = { 0x3200000000ull, 0x0100000000ull }, + [PM_ISU0] = { 0x0200000000ull, 0x0080000000ull }, + [PM_ISU1] = { 0x3200000000ull, 0x3100000000ull }, + [PM_IFU] = { 0x3200000000ull, 0x2100000000ull }, + [PM_IDU] = { 0x0e00000000ull, 0x0040000000ull }, + [PM_GRS] = { 0x0e00000000ull, 0x0c40000000ull }, +}; + +static int power5p_get_constraint(unsigned int event, u64 *maskp, u64 *valp) +{ + int pmc, byte, unit, sh; + int bit, fmask; + u64 mask = 0, value = 0; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 4) + return -1; + sh = (pmc - 1) * 2; + mask |= 2 << sh; + value |= 1 << sh; + } + if (event & PM_BUSEVENT_MSK) { + unit = (event >> PM_UNIT_SH) & PM_UNIT_MSK; + if (unit > PM_LASTUNIT) + return -1; + if (unit == PM_ISU0_ALT) + unit = PM_ISU0; + mask |= unit_cons[unit][0]; + value |= unit_cons[unit][1]; + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + if (byte >= 4) { + if (unit != PM_LSU1) + return -1; + /* Map LSU1 low word (bytes 4-7) to unit LSU1+1 */ + ++unit; + byte &= 3; + } + if (unit == PM_GRS) { + bit = event & 7; + fmask = (bit == 6)? 7: 3; + sh = grsel_shift[bit]; + mask |= (u64)fmask << sh; + value |= (u64)((event >> PM_GRS_SH) & fmask) << sh; + } + /* Set byte lane select field */ + mask |= 0xfULL << (20 - 4 * byte); + value |= (u64)unit << (20 - 4 * byte); + } + mask |= 0x8000000000000ull; + value |= 0x1000000000000ull; + *maskp = mask; + *valp = value; + return 0; +} + +#define MAX_ALT 3 /* at most 3 alternatives for any event */ + +static const unsigned int event_alternatives[][MAX_ALT] = { + { 0x100c0, 0x40001f }, /* PM_GCT_FULL_CYC */ + { 0x120e4, 0x400002 }, /* PM_GRP_DISP_REJECT */ + { 0x230e2, 0x323087 }, /* PM_BR_PRED_CR */ + { 0x230e3, 0x223087, 0x3230a0 }, /* PM_BR_PRED_TA */ + { 0x410c7, 0x441084 }, /* PM_THRD_L2MISS_BOTH_CYC */ + { 0x800c4, 0xc20e0 }, /* PM_DTLB_MISS */ + { 0xc50c6, 0xc60e0 }, /* PM_MRK_DTLB_MISS */ + { 0x100009, 0x200009 }, /* PM_INST_CMPL */ + { 0x200015, 0x300015 }, /* PM_LSU_LMQ_SRQ_EMPTY_CYC */ + { 0x300009, 0x400009 }, /* PM_INST_DISP */ +}; + +/* + * Scan the alternatives table for a match and return the + * index into the alternatives table if found, else -1. + */ +static int find_alternative(unsigned int event) +{ + int i, j; + + for (i = 0; i < ARRAY_SIZE(event_alternatives); ++i) { + if (event < event_alternatives[i][0]) + break; + for (j = 0; j < MAX_ALT && event_alternatives[i][j]; ++j) + if (event == event_alternatives[i][j]) + return i; + } + return -1; +} + +static const unsigned char bytedecode_alternatives[4][4] = { + /* PMC 1 */ { 0x21, 0x23, 0x25, 0x27 }, + /* PMC 2 */ { 0x07, 0x17, 0x0e, 0x1e }, + /* PMC 3 */ { 0x20, 0x22, 0x24, 0x26 }, + /* PMC 4 */ { 0x07, 0x17, 0x0e, 0x1e } +}; + +/* + * Some direct events for decodes of event bus byte 3 have alternative + * PMCSEL values on other counters. This returns the alternative + * event code for those that do, or -1 otherwise. This also handles + * alternative PCMSEL values for add events. + */ +static int find_alternative_bdecode(unsigned int event) +{ + int pmc, altpmc, pp, j; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc == 0 || pmc > 4) + return -1; + altpmc = 5 - pmc; /* 1 <-> 4, 2 <-> 3 */ + pp = event & PM_PMCSEL_MSK; + for (j = 0; j < 4; ++j) { + if (bytedecode_alternatives[pmc - 1][j] == pp) { + return (event & ~(PM_PMC_MSKS | PM_PMCSEL_MSK)) | + (altpmc << PM_PMC_SH) | + bytedecode_alternatives[altpmc - 1][j]; + } + } + + /* new decode alternatives for power5+ */ + if (pmc == 1 && (pp == 0x0d || pp == 0x0e)) + return event + (2 << PM_PMC_SH) + (0x2e - 0x0d); + if (pmc == 3 && (pp == 0x2e || pp == 0x2f)) + return event - (2 << PM_PMC_SH) - (0x2e - 0x0d); + + /* alternative add event encodings */ + if (pp == 0x10 || pp == 0x28) + return ((event ^ (0x10 ^ 0x28)) & ~PM_PMC_MSKS) | + (altpmc << PM_PMC_SH); + + return -1; +} + +static int power5p_get_alternatives(unsigned int event, unsigned int alt[]) +{ + int i, j, ae, nalt = 1; + + alt[0] = event; + nalt = 1; + i = find_alternative(event); + if (i >= 0) { + for (j = 0; j < MAX_ALT; ++j) { + ae = event_alternatives[i][j]; + if (ae && ae != event) + alt[nalt++] = ae; + } + } else { + ae = find_alternative_bdecode(event); + if (ae > 0) + alt[nalt++] = ae; + } + return nalt; +} + +static int power5p_compute_mmcr(unsigned int event[], int n_ev, + unsigned int hwc[], u64 mmcr[]) +{ + u64 mmcr1 = 0; + unsigned int pmc, unit, byte, psel; + unsigned int ttm; + int i, isbus, bit, grsel; + unsigned int pmc_inuse = 0; + unsigned char busbyte[4]; + unsigned char unituse[16]; + int ttmuse; + + if (n_ev > 4) + return -1; + + /* First pass to count resource use */ + memset(busbyte, 0, sizeof(busbyte)); + memset(unituse, 0, sizeof(unituse)); + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 4) + return -1; + if (pmc_inuse & (1 << (pmc - 1))) + return -1; + pmc_inuse |= 1 << (pmc - 1); + } + if (event[i] & PM_BUSEVENT_MSK) { + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + if (unit > PM_LASTUNIT) + return -1; + if (unit == PM_ISU0_ALT) + unit = PM_ISU0; + if (byte >= 4) { + if (unit != PM_LSU1) + return -1; + ++unit; + byte &= 3; + } + if (busbyte[byte] && busbyte[byte] != unit) + return -1; + busbyte[byte] = unit; + unituse[unit] = 1; + } + } + + /* + * Assign resources and set multiplexer selects. + * + * PM_ISU0 can go either on TTM0 or TTM1, but that's the only + * choice we have to deal with. + */ + if (unituse[PM_ISU0] & + (unituse[PM_FPU] | unituse[PM_IFU] | unituse[PM_ISU1])) { + unituse[PM_ISU0_ALT] = 1; /* move ISU to TTM1 */ + unituse[PM_ISU0] = 0; + } + /* Set TTM[01]SEL fields. */ + ttmuse = 0; + for (i = PM_FPU; i <= PM_ISU1; ++i) { + if (!unituse[i]) + continue; + if (ttmuse++) + return -1; + mmcr1 |= (u64)i << MMCR1_TTM0SEL_SH; + } + ttmuse = 0; + for (; i <= PM_GRS; ++i) { + if (!unituse[i]) + continue; + if (ttmuse++) + return -1; + mmcr1 |= (u64)(i & 3) << MMCR1_TTM1SEL_SH; + } + if (ttmuse > 1) + return -1; + + /* Set byte lane select fields, TTM[23]SEL and GRS_*SEL. */ + for (byte = 0; byte < 4; ++byte) { + unit = busbyte[byte]; + if (!unit) + continue; + if (unit == PM_ISU0 && unituse[PM_ISU0_ALT]) { + /* get ISU0 through TTM1 rather than TTM0 */ + unit = PM_ISU0_ALT; + } else if (unit == PM_LSU1 + 1) { + /* select lower word of LSU1 for this byte */ + mmcr1 |= 1ull << (MMCR1_TTM3SEL_SH + 3 - byte); + } + ttm = unit >> 2; + mmcr1 |= (u64)ttm << (MMCR1_TD_CP_DBG0SEL_SH - 2 * byte); + } + + /* Second pass: assign PMCs, set PMCxSEL and PMCx_ADDER_SEL fields */ + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + psel = event[i] & PM_PMCSEL_MSK; + isbus = event[i] & PM_BUSEVENT_MSK; + if (!pmc) { + /* Bus event or any-PMC direct event */ + for (pmc = 0; pmc < 4; ++pmc) { + if (!(pmc_inuse & (1 << pmc))) + break; + } + if (pmc >= 4) + return -1; + pmc_inuse |= 1 << pmc; + } else { + /* Direct event */ + --pmc; + if (isbus && (byte & 2) && + (psel == 8 || psel == 0x10 || psel == 0x28)) + /* add events on higher-numbered bus */ + mmcr1 |= 1ull << (MMCR1_PMC1_ADDER_SEL_SH - pmc); + } + if (isbus && unit == PM_GRS) { + bit = psel & 7; + grsel = (event[i] >> PM_GRS_SH) & PM_GRS_MSK; + mmcr1 |= (u64)grsel << grsel_shift[bit]; + } + if ((psel & 0x58) == 0x40 && (byte & 1) != ((pmc >> 1) & 1)) + /* select alternate byte lane */ + psel |= 0x10; + if (pmc <= 3) + mmcr1 |= psel << MMCR1_PMCSEL_SH(pmc); + hwc[i] = pmc; + } + + /* Return MMCRx values */ + mmcr[0] = 0; + if (pmc_inuse & 1) + mmcr[0] = MMCR0_PMC1CE; + if (pmc_inuse & 0x3e) + mmcr[0] |= MMCR0_PMCjCE; + mmcr[1] = mmcr1; + mmcr[2] = 0; + return 0; +} + +static void power5p_disable_pmc(unsigned int pmc, u64 mmcr[]) +{ + if (pmc <= 3) + mmcr[1] &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc)); +} + +static int power5p_generic_events[] = { + [PERF_COUNT_CPU_CYCLES] = 0xf, + [PERF_COUNT_INSTRUCTIONS] = 0x100009, + [PERF_COUNT_CACHE_REFERENCES] = 0x1c10a8, /* LD_REF_L1 */ + [PERF_COUNT_CACHE_MISSES] = 0x3c1088, /* LD_MISS_L1 */ + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x230e4, /* BR_ISSUED */ + [PERF_COUNT_BRANCH_MISSES] = 0x230e5, /* BR_MPRED_CR */ +}; + +struct power_pmu power5p_pmu = { + .n_counter = 4, + .max_alternatives = MAX_ALT, + .add_fields = 0x7000000000055ull, + .test_adder = 0x3000040000000ull, + .compute_mmcr = power5p_compute_mmcr, + .get_constraint = power5p_get_constraint, + .get_alternatives = power5p_get_alternatives, + .disable_pmc = power5p_disable_pmc, + .n_generic = ARRAY_SIZE(power5p_generic_events), + .generic_events = power5p_generic_events, +}; Index: linux-2.6-tip/arch/powerpc/kernel/power5-pmu.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/power5-pmu.c @@ -0,0 +1,475 @@ +/* + * Performance counter support for POWER5 (not POWER5++) processors. + * + * Copyright 2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +/* + * Bits in event code for POWER5 (not POWER5++) + */ +#define PM_PMC_SH 20 /* PMC number (1-based) for direct events */ +#define PM_PMC_MSK 0xf +#define PM_PMC_MSKS (PM_PMC_MSK << PM_PMC_SH) +#define PM_UNIT_SH 16 /* TTMMUX number and setting - unit select */ +#define PM_UNIT_MSK 0xf +#define PM_BYTE_SH 12 /* Byte number of event bus to use */ +#define PM_BYTE_MSK 7 +#define PM_GRS_SH 8 /* Storage subsystem mux select */ +#define PM_GRS_MSK 7 +#define PM_BUSEVENT_MSK 0x80 /* Set if event uses event bus */ +#define PM_PMCSEL_MSK 0x7f + +/* Values in PM_UNIT field */ +#define PM_FPU 0 +#define PM_ISU0 1 +#define PM_IFU 2 +#define PM_ISU1 3 +#define PM_IDU 4 +#define PM_ISU0_ALT 6 +#define PM_GRS 7 +#define PM_LSU0 8 +#define PM_LSU1 0xc +#define PM_LASTUNIT 0xc + +/* + * Bits in MMCR1 for POWER5 + */ +#define MMCR1_TTM0SEL_SH 62 +#define MMCR1_TTM1SEL_SH 60 +#define MMCR1_TTM2SEL_SH 58 +#define MMCR1_TTM3SEL_SH 56 +#define MMCR1_TTMSEL_MSK 3 +#define MMCR1_TD_CP_DBG0SEL_SH 54 +#define MMCR1_TD_CP_DBG1SEL_SH 52 +#define MMCR1_TD_CP_DBG2SEL_SH 50 +#define MMCR1_TD_CP_DBG3SEL_SH 48 +#define MMCR1_GRS_L2SEL_SH 46 +#define MMCR1_GRS_L2SEL_MSK 3 +#define MMCR1_GRS_L3SEL_SH 44 +#define MMCR1_GRS_L3SEL_MSK 3 +#define MMCR1_GRS_MCSEL_SH 41 +#define MMCR1_GRS_MCSEL_MSK 7 +#define MMCR1_GRS_FABSEL_SH 39 +#define MMCR1_GRS_FABSEL_MSK 3 +#define MMCR1_PMC1_ADDER_SEL_SH 35 +#define MMCR1_PMC2_ADDER_SEL_SH 34 +#define MMCR1_PMC3_ADDER_SEL_SH 33 +#define MMCR1_PMC4_ADDER_SEL_SH 32 +#define MMCR1_PMC1SEL_SH 25 +#define MMCR1_PMC2SEL_SH 17 +#define MMCR1_PMC3SEL_SH 9 +#define MMCR1_PMC4SEL_SH 1 +#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) +#define MMCR1_PMCSEL_MSK 0x7f + +/* + * Bits in MMCRA + */ + +/* + * Layout of constraint bits: + * 6666555555555544444444443333333333222222222211111111110000000000 + * 3210987654321098765432109876543210987654321098765432109876543210 + * <><>[ ><><>< ><> [ >[ >[ >< >< >< >< ><><><><><><> + * T0T1 NC G0G1G2 G3 UC PS1PS2 B0 B1 B2 B3 P6P5P4P3P2P1 + * + * T0 - TTM0 constraint + * 54-55: TTM0SEL value (0=FPU, 2=IFU, 3=ISU1) 0xc0_0000_0000_0000 + * + * T1 - TTM1 constraint + * 52-53: TTM1SEL value (0=IDU, 3=GRS) 0x30_0000_0000_0000 + * + * NC - number of counters + * 51: NC error 0x0008_0000_0000_0000 + * 48-50: number of events needing PMC1-4 0x0007_0000_0000_0000 + * + * G0..G3 - GRS mux constraints + * 46-47: GRS_L2SEL value + * 44-45: GRS_L3SEL value + * 41-44: GRS_MCSEL value + * 39-40: GRS_FABSEL value + * Note that these match up with their bit positions in MMCR1 + * + * UC - unit constraint: can't have all three of FPU|IFU|ISU1, ISU0, IDU|GRS + * 37: UC3 error 0x20_0000_0000 + * 36: FPU|IFU|ISU1 events needed 0x10_0000_0000 + * 35: ISU0 events needed 0x08_0000_0000 + * 34: IDU|GRS events needed 0x04_0000_0000 + * + * PS1 + * 33: PS1 error 0x2_0000_0000 + * 31-32: count of events needing PMC1/2 0x1_8000_0000 + * + * PS2 + * 30: PS2 error 0x4000_0000 + * 28-29: count of events needing PMC3/4 0x3000_0000 + * + * B0 + * 24-27: Byte 0 event source 0x0f00_0000 + * Encoding as for the event code + * + * B1, B2, B3 + * 20-23, 16-19, 12-15: Byte 1, 2, 3 event sources + * + * P1..P6 + * 0-11: Count of events needing PMC1..PMC6 + */ + +static const int grsel_shift[8] = { + MMCR1_GRS_L2SEL_SH, MMCR1_GRS_L2SEL_SH, MMCR1_GRS_L2SEL_SH, + MMCR1_GRS_L3SEL_SH, MMCR1_GRS_L3SEL_SH, MMCR1_GRS_L3SEL_SH, + MMCR1_GRS_MCSEL_SH, MMCR1_GRS_FABSEL_SH +}; + +/* Masks and values for using events from the various units */ +static u64 unit_cons[PM_LASTUNIT+1][2] = { + [PM_FPU] = { 0xc0002000000000ull, 0x00001000000000ull }, + [PM_ISU0] = { 0x00002000000000ull, 0x00000800000000ull }, + [PM_ISU1] = { 0xc0002000000000ull, 0xc0001000000000ull }, + [PM_IFU] = { 0xc0002000000000ull, 0x80001000000000ull }, + [PM_IDU] = { 0x30002000000000ull, 0x00000400000000ull }, + [PM_GRS] = { 0x30002000000000ull, 0x30000400000000ull }, +}; + +static int power5_get_constraint(unsigned int event, u64 *maskp, u64 *valp) +{ + int pmc, byte, unit, sh; + int bit, fmask; + u64 mask = 0, value = 0; + int grp = -1; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 6) + return -1; + sh = (pmc - 1) * 2; + mask |= 2 << sh; + value |= 1 << sh; + if (pmc <= 4) + grp = (pmc - 1) >> 1; + else if (event != 0x500009 && event != 0x600005) + return -1; + } + if (event & PM_BUSEVENT_MSK) { + unit = (event >> PM_UNIT_SH) & PM_UNIT_MSK; + if (unit > PM_LASTUNIT) + return -1; + if (unit == PM_ISU0_ALT) + unit = PM_ISU0; + mask |= unit_cons[unit][0]; + value |= unit_cons[unit][1]; + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + if (byte >= 4) { + if (unit != PM_LSU1) + return -1; + /* Map LSU1 low word (bytes 4-7) to unit LSU1+1 */ + ++unit; + byte &= 3; + } + if (unit == PM_GRS) { + bit = event & 7; + fmask = (bit == 6)? 7: 3; + sh = grsel_shift[bit]; + mask |= (u64)fmask << sh; + value |= (u64)((event >> PM_GRS_SH) & fmask) << sh; + } + /* + * Bus events on bytes 0 and 2 can be counted + * on PMC1/2; bytes 1 and 3 on PMC3/4. + */ + if (!pmc) + grp = byte & 1; + /* Set byte lane select field */ + mask |= 0xfULL << (24 - 4 * byte); + value |= (u64)unit << (24 - 4 * byte); + } + if (grp == 0) { + /* increment PMC1/2 field */ + mask |= 0x200000000ull; + value |= 0x080000000ull; + } else if (grp == 1) { + /* increment PMC3/4 field */ + mask |= 0x40000000ull; + value |= 0x10000000ull; + } + if (pmc < 5) { + /* need a counter from PMC1-4 set */ + mask |= 0x8000000000000ull; + value |= 0x1000000000000ull; + } + *maskp = mask; + *valp = value; + return 0; +} + +#define MAX_ALT 3 /* at most 3 alternatives for any event */ + +static const unsigned int event_alternatives[][MAX_ALT] = { + { 0x120e4, 0x400002 }, /* PM_GRP_DISP_REJECT */ + { 0x410c7, 0x441084 }, /* PM_THRD_L2MISS_BOTH_CYC */ + { 0x100005, 0x600005 }, /* PM_RUN_CYC */ + { 0x100009, 0x200009, 0x500009 }, /* PM_INST_CMPL */ + { 0x300009, 0x400009 }, /* PM_INST_DISP */ +}; + +/* + * Scan the alternatives table for a match and return the + * index into the alternatives table if found, else -1. + */ +static int find_alternative(unsigned int event) +{ + int i, j; + + for (i = 0; i < ARRAY_SIZE(event_alternatives); ++i) { + if (event < event_alternatives[i][0]) + break; + for (j = 0; j < MAX_ALT && event_alternatives[i][j]; ++j) + if (event == event_alternatives[i][j]) + return i; + } + return -1; +} + +static const unsigned char bytedecode_alternatives[4][4] = { + /* PMC 1 */ { 0x21, 0x23, 0x25, 0x27 }, + /* PMC 2 */ { 0x07, 0x17, 0x0e, 0x1e }, + /* PMC 3 */ { 0x20, 0x22, 0x24, 0x26 }, + /* PMC 4 */ { 0x07, 0x17, 0x0e, 0x1e } +}; + +/* + * Some direct events for decodes of event bus byte 3 have alternative + * PMCSEL values on other counters. This returns the alternative + * event code for those that do, or -1 otherwise. + */ +static int find_alternative_bdecode(unsigned int event) +{ + int pmc, altpmc, pp, j; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc == 0 || pmc > 4) + return -1; + altpmc = 5 - pmc; /* 1 <-> 4, 2 <-> 3 */ + pp = event & PM_PMCSEL_MSK; + for (j = 0; j < 4; ++j) { + if (bytedecode_alternatives[pmc - 1][j] == pp) { + return (event & ~(PM_PMC_MSKS | PM_PMCSEL_MSK)) | + (altpmc << PM_PMC_SH) | + bytedecode_alternatives[altpmc - 1][j]; + } + } + return -1; +} + +static int power5_get_alternatives(unsigned int event, unsigned int alt[]) +{ + int i, j, ae, nalt = 1; + + alt[0] = event; + nalt = 1; + i = find_alternative(event); + if (i >= 0) { + for (j = 0; j < MAX_ALT; ++j) { + ae = event_alternatives[i][j]; + if (ae && ae != event) + alt[nalt++] = ae; + } + } else { + ae = find_alternative_bdecode(event); + if (ae > 0) + alt[nalt++] = ae; + } + return nalt; +} + +static int power5_compute_mmcr(unsigned int event[], int n_ev, + unsigned int hwc[], u64 mmcr[]) +{ + u64 mmcr1 = 0; + unsigned int pmc, unit, byte, psel; + unsigned int ttm, grp; + int i, isbus, bit, grsel; + unsigned int pmc_inuse = 0; + unsigned int pmc_grp_use[2]; + unsigned char busbyte[4]; + unsigned char unituse[16]; + int ttmuse; + + if (n_ev > 6) + return -1; + + /* First pass to count resource use */ + pmc_grp_use[0] = pmc_grp_use[1] = 0; + memset(busbyte, 0, sizeof(busbyte)); + memset(unituse, 0, sizeof(unituse)); + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 6) + return -1; + if (pmc_inuse & (1 << (pmc - 1))) + return -1; + pmc_inuse |= 1 << (pmc - 1); + /* count 1/2 vs 3/4 use */ + if (pmc <= 4) + ++pmc_grp_use[(pmc - 1) >> 1]; + } + if (event[i] & PM_BUSEVENT_MSK) { + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + if (unit > PM_LASTUNIT) + return -1; + if (unit == PM_ISU0_ALT) + unit = PM_ISU0; + if (byte >= 4) { + if (unit != PM_LSU1) + return -1; + ++unit; + byte &= 3; + } + if (!pmc) + ++pmc_grp_use[byte & 1]; + if (busbyte[byte] && busbyte[byte] != unit) + return -1; + busbyte[byte] = unit; + unituse[unit] = 1; + } + } + if (pmc_grp_use[0] > 2 || pmc_grp_use[1] > 2) + return -1; + + /* + * Assign resources and set multiplexer selects. + * + * PM_ISU0 can go either on TTM0 or TTM1, but that's the only + * choice we have to deal with. + */ + if (unituse[PM_ISU0] & + (unituse[PM_FPU] | unituse[PM_IFU] | unituse[PM_ISU1])) { + unituse[PM_ISU0_ALT] = 1; /* move ISU to TTM1 */ + unituse[PM_ISU0] = 0; + } + /* Set TTM[01]SEL fields. */ + ttmuse = 0; + for (i = PM_FPU; i <= PM_ISU1; ++i) { + if (!unituse[i]) + continue; + if (ttmuse++) + return -1; + mmcr1 |= (u64)i << MMCR1_TTM0SEL_SH; + } + ttmuse = 0; + for (; i <= PM_GRS; ++i) { + if (!unituse[i]) + continue; + if (ttmuse++) + return -1; + mmcr1 |= (u64)(i & 3) << MMCR1_TTM1SEL_SH; + } + if (ttmuse > 1) + return -1; + + /* Set byte lane select fields, TTM[23]SEL and GRS_*SEL. */ + for (byte = 0; byte < 4; ++byte) { + unit = busbyte[byte]; + if (!unit) + continue; + if (unit == PM_ISU0 && unituse[PM_ISU0_ALT]) { + /* get ISU0 through TTM1 rather than TTM0 */ + unit = PM_ISU0_ALT; + } else if (unit == PM_LSU1 + 1) { + /* select lower word of LSU1 for this byte */ + mmcr1 |= 1ull << (MMCR1_TTM3SEL_SH + 3 - byte); + } + ttm = unit >> 2; + mmcr1 |= (u64)ttm << (MMCR1_TD_CP_DBG0SEL_SH - 2 * byte); + } + + /* Second pass: assign PMCs, set PMCxSEL and PMCx_ADDER_SEL fields */ + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + psel = event[i] & PM_PMCSEL_MSK; + isbus = event[i] & PM_BUSEVENT_MSK; + if (!pmc) { + /* Bus event or any-PMC direct event */ + for (pmc = 0; pmc < 4; ++pmc) { + if (pmc_inuse & (1 << pmc)) + continue; + grp = (pmc >> 1) & 1; + if (isbus) { + if (grp == (byte & 1)) + break; + } else if (pmc_grp_use[grp] < 2) { + ++pmc_grp_use[grp]; + break; + } + } + pmc_inuse |= 1 << pmc; + } else if (pmc <= 4) { + /* Direct event */ + --pmc; + if ((psel == 8 || psel == 0x10) && isbus && (byte & 2)) + /* add events on higher-numbered bus */ + mmcr1 |= 1ull << (MMCR1_PMC1_ADDER_SEL_SH - pmc); + } else { + /* Instructions or run cycles on PMC5/6 */ + --pmc; + } + if (isbus && unit == PM_GRS) { + bit = psel & 7; + grsel = (event[i] >> PM_GRS_SH) & PM_GRS_MSK; + mmcr1 |= (u64)grsel << grsel_shift[bit]; + } + if (pmc <= 3) + mmcr1 |= psel << MMCR1_PMCSEL_SH(pmc); + hwc[i] = pmc; + } + + /* Return MMCRx values */ + mmcr[0] = 0; + if (pmc_inuse & 1) + mmcr[0] = MMCR0_PMC1CE; + if (pmc_inuse & 0x3e) + mmcr[0] |= MMCR0_PMCjCE; + mmcr[1] = mmcr1; + mmcr[2] = 0; + return 0; +} + +static void power5_disable_pmc(unsigned int pmc, u64 mmcr[]) +{ + if (pmc <= 3) + mmcr[1] &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc)); +} + +static int power5_generic_events[] = { + [PERF_COUNT_CPU_CYCLES] = 0xf, + [PERF_COUNT_INSTRUCTIONS] = 0x100009, + [PERF_COUNT_CACHE_REFERENCES] = 0x4c1090, /* LD_REF_L1 */ + [PERF_COUNT_CACHE_MISSES] = 0x3c1088, /* LD_MISS_L1 */ + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x230e4, /* BR_ISSUED */ + [PERF_COUNT_BRANCH_MISSES] = 0x230e5, /* BR_MPRED_CR */ +}; + +struct power_pmu power5_pmu = { + .n_counter = 6, + .max_alternatives = MAX_ALT, + .add_fields = 0x7000090000555ull, + .test_adder = 0x3000490000000ull, + .compute_mmcr = power5_compute_mmcr, + .get_constraint = power5_get_constraint, + .get_alternatives = power5_get_alternatives, + .disable_pmc = power5_disable_pmc, + .n_generic = ARRAY_SIZE(power5_generic_events), + .generic_events = power5_generic_events, +}; Index: linux-2.6-tip/arch/powerpc/kernel/power6-pmu.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/power6-pmu.c @@ -0,0 +1,283 @@ +/* + * Performance counter support for POWER6 processors. + * + * Copyright 2008-2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +/* + * Bits in event code for POWER6 + */ +#define PM_PMC_SH 20 /* PMC number (1-based) for direct events */ +#define PM_PMC_MSK 0x7 +#define PM_PMC_MSKS (PM_PMC_MSK << PM_PMC_SH) +#define PM_UNIT_SH 16 /* Unit event comes (TTMxSEL encoding) */ +#define PM_UNIT_MSK 0xf +#define PM_UNIT_MSKS (PM_UNIT_MSK << PM_UNIT_SH) +#define PM_LLAV 0x8000 /* Load lookahead match value */ +#define PM_LLA 0x4000 /* Load lookahead match enable */ +#define PM_BYTE_SH 12 /* Byte of event bus to use */ +#define PM_BYTE_MSK 3 +#define PM_SUBUNIT_SH 8 /* Subunit event comes from (NEST_SEL enc.) */ +#define PM_SUBUNIT_MSK 7 +#define PM_SUBUNIT_MSKS (PM_SUBUNIT_MSK << PM_SUBUNIT_SH) +#define PM_PMCSEL_MSK 0xff /* PMCxSEL value */ +#define PM_BUSEVENT_MSK 0xf3700 + +/* + * Bits in MMCR1 for POWER6 + */ +#define MMCR1_TTM0SEL_SH 60 +#define MMCR1_TTMSEL_SH(n) (MMCR1_TTM0SEL_SH - (n) * 4) +#define MMCR1_TTMSEL_MSK 0xf +#define MMCR1_TTMSEL(m, n) (((m) >> MMCR1_TTMSEL_SH(n)) & MMCR1_TTMSEL_MSK) +#define MMCR1_NESTSEL_SH 45 +#define MMCR1_NESTSEL_MSK 0x7 +#define MMCR1_NESTSEL(m) (((m) >> MMCR1_NESTSEL_SH) & MMCR1_NESTSEL_MSK) +#define MMCR1_PMC1_LLA ((u64)1 << 44) +#define MMCR1_PMC1_LLA_VALUE ((u64)1 << 39) +#define MMCR1_PMC1_ADDR_SEL ((u64)1 << 35) +#define MMCR1_PMC1SEL_SH 24 +#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) +#define MMCR1_PMCSEL_MSK 0xff + +/* + * Assign PMC numbers and compute MMCR1 value for a set of events + */ +static int p6_compute_mmcr(unsigned int event[], int n_ev, + unsigned int hwc[], u64 mmcr[]) +{ + u64 mmcr1 = 0; + int i; + unsigned int pmc, ev, b, u, s, psel; + unsigned int ttmset = 0; + unsigned int pmc_inuse = 0; + + if (n_ev > 4) + return -1; + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc_inuse & (1 << (pmc - 1))) + return -1; /* collision! */ + pmc_inuse |= 1 << (pmc - 1); + } + } + for (i = 0; i < n_ev; ++i) { + ev = event[i]; + pmc = (ev >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + --pmc; + } else { + /* can go on any PMC; find a free one */ + for (pmc = 0; pmc < 4; ++pmc) + if (!(pmc_inuse & (1 << pmc))) + break; + pmc_inuse |= 1 << pmc; + } + hwc[i] = pmc; + psel = ev & PM_PMCSEL_MSK; + if (ev & PM_BUSEVENT_MSK) { + /* this event uses the event bus */ + b = (ev >> PM_BYTE_SH) & PM_BYTE_MSK; + u = (ev >> PM_UNIT_SH) & PM_UNIT_MSK; + /* check for conflict on this byte of event bus */ + if ((ttmset & (1 << b)) && MMCR1_TTMSEL(mmcr1, b) != u) + return -1; + mmcr1 |= (u64)u << MMCR1_TTMSEL_SH(b); + ttmset |= 1 << b; + if (u == 5) { + /* Nest events have a further mux */ + s = (ev >> PM_SUBUNIT_SH) & PM_SUBUNIT_MSK; + if ((ttmset & 0x10) && + MMCR1_NESTSEL(mmcr1) != s) + return -1; + ttmset |= 0x10; + mmcr1 |= (u64)s << MMCR1_NESTSEL_SH; + } + if (0x30 <= psel && psel <= 0x3d) { + /* these need the PMCx_ADDR_SEL bits */ + if (b >= 2) + mmcr1 |= MMCR1_PMC1_ADDR_SEL >> pmc; + } + /* bus select values are different for PMC3/4 */ + if (pmc >= 2 && (psel & 0x90) == 0x80) + psel ^= 0x20; + } + if (ev & PM_LLA) { + mmcr1 |= MMCR1_PMC1_LLA >> pmc; + if (ev & PM_LLAV) + mmcr1 |= MMCR1_PMC1_LLA_VALUE >> pmc; + } + mmcr1 |= (u64)psel << MMCR1_PMCSEL_SH(pmc); + } + mmcr[0] = 0; + if (pmc_inuse & 1) + mmcr[0] = MMCR0_PMC1CE; + if (pmc_inuse & 0xe) + mmcr[0] |= MMCR0_PMCjCE; + mmcr[1] = mmcr1; + mmcr[2] = 0; + return 0; +} + +/* + * Layout of constraint bits: + * + * 0-1 add field: number of uses of PMC1 (max 1) + * 2-3, 4-5, 6-7: ditto for PMC2, 3, 4 + * 8-10 select field: nest (subunit) event selector + * 16-19 select field: unit on byte 0 of event bus + * 20-23, 24-27, 28-31 ditto for bytes 1, 2, 3 + */ +static int p6_get_constraint(unsigned int event, u64 *maskp, u64 *valp) +{ + int pmc, byte, sh; + unsigned int mask = 0, value = 0; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 4) + return -1; + sh = (pmc - 1) * 2; + mask |= 2 << sh; + value |= 1 << sh; + } + if (event & PM_BUSEVENT_MSK) { + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + sh = byte * 4; + mask |= PM_UNIT_MSKS << sh; + value |= (event & PM_UNIT_MSKS) << sh; + if ((event & PM_UNIT_MSKS) == (5 << PM_UNIT_SH)) { + mask |= PM_SUBUNIT_MSKS; + value |= event & PM_SUBUNIT_MSKS; + } + } + *maskp = mask; + *valp = value; + return 0; +} + +#define MAX_ALT 4 /* at most 4 alternatives for any event */ + +static const unsigned int event_alternatives[][MAX_ALT] = { + { 0x0130e8, 0x2000f6, 0x3000fc }, /* PM_PTEG_RELOAD_VALID */ + { 0x080080, 0x10000d, 0x30000c, 0x4000f0 }, /* PM_LD_MISS_L1 */ + { 0x080088, 0x200054, 0x3000f0 }, /* PM_ST_MISS_L1 */ + { 0x10000a, 0x2000f4 }, /* PM_RUN_CYC */ + { 0x10000b, 0x2000f5 }, /* PM_RUN_COUNT */ + { 0x10000e, 0x400010 }, /* PM_PURR */ + { 0x100010, 0x4000f8 }, /* PM_FLUSH */ + { 0x10001a, 0x200010 }, /* PM_MRK_INST_DISP */ + { 0x100026, 0x3000f8 }, /* PM_TB_BIT_TRANS */ + { 0x100054, 0x2000f0 }, /* PM_ST_FIN */ + { 0x100056, 0x2000fc }, /* PM_L1_ICACHE_MISS */ + { 0x1000f0, 0x40000a }, /* PM_INST_IMC_MATCH_CMPL */ + { 0x1000f8, 0x200008 }, /* PM_GCT_EMPTY_CYC */ + { 0x1000fc, 0x400006 }, /* PM_LSU_DERAT_MISS_CYC */ + { 0x20000e, 0x400007 }, /* PM_LSU_DERAT_MISS */ + { 0x200012, 0x300012 }, /* PM_INST_DISP */ + { 0x2000f2, 0x3000f2 }, /* PM_INST_DISP */ + { 0x2000f8, 0x300010 }, /* PM_EXT_INT */ + { 0x2000fe, 0x300056 }, /* PM_DATA_FROM_L2MISS */ + { 0x2d0030, 0x30001a }, /* PM_MRK_FPU_FIN */ + { 0x30000a, 0x400018 }, /* PM_MRK_INST_FIN */ + { 0x3000f6, 0x40000e }, /* PM_L1_DCACHE_RELOAD_VALID */ + { 0x3000fe, 0x400056 }, /* PM_DATA_FROM_L3MISS */ +}; + +/* + * This could be made more efficient with a binary search on + * a presorted list, if necessary + */ +static int find_alternatives_list(unsigned int event) +{ + int i, j; + unsigned int alt; + + for (i = 0; i < ARRAY_SIZE(event_alternatives); ++i) { + if (event < event_alternatives[i][0]) + return -1; + for (j = 0; j < MAX_ALT; ++j) { + alt = event_alternatives[i][j]; + if (!alt || event < alt) + break; + if (event == alt) + return i; + } + } + return -1; +} + +static int p6_get_alternatives(unsigned int event, unsigned int alt[]) +{ + int i, j; + unsigned int aevent, psel, pmc; + unsigned int nalt = 1; + + alt[0] = event; + + /* check the alternatives table */ + i = find_alternatives_list(event); + if (i >= 0) { + /* copy out alternatives from list */ + for (j = 0; j < MAX_ALT; ++j) { + aevent = event_alternatives[i][j]; + if (!aevent) + break; + if (aevent != event) + alt[nalt++] = aevent; + } + + } else { + /* Check for alternative ways of computing sum events */ + /* PMCSEL 0x32 counter N == PMCSEL 0x34 counter 5-N */ + psel = event & (PM_PMCSEL_MSK & ~1); /* ignore edge bit */ + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc && (psel == 0x32 || psel == 0x34)) + alt[nalt++] = ((event ^ 0x6) & ~PM_PMC_MSKS) | + ((5 - pmc) << PM_PMC_SH); + + /* PMCSEL 0x38 counter N == PMCSEL 0x3a counter N+/-2 */ + if (pmc && (psel == 0x38 || psel == 0x3a)) + alt[nalt++] = ((event ^ 0x2) & ~PM_PMC_MSKS) | + ((pmc > 2? pmc - 2: pmc + 2) << PM_PMC_SH); + } + + return nalt; +} + +static void p6_disable_pmc(unsigned int pmc, u64 mmcr[]) +{ + /* Set PMCxSEL to 0 to disable PMCx */ + mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc)); +} + +static int power6_generic_events[] = { + [PERF_COUNT_CPU_CYCLES] = 0x1e, + [PERF_COUNT_INSTRUCTIONS] = 2, + [PERF_COUNT_CACHE_REFERENCES] = 0x280030, /* LD_REF_L1 */ + [PERF_COUNT_CACHE_MISSES] = 0x30000c, /* LD_MISS_L1 */ + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x410a0, /* BR_PRED */ + [PERF_COUNT_BRANCH_MISSES] = 0x400052, /* BR_MPRED */ +}; + +struct power_pmu power6_pmu = { + .n_counter = 4, + .max_alternatives = MAX_ALT, + .add_fields = 0x55, + .test_adder = 0, + .compute_mmcr = p6_compute_mmcr, + .get_constraint = p6_get_constraint, + .get_alternatives = p6_get_alternatives, + .disable_pmc = p6_disable_pmc, + .n_generic = ARRAY_SIZE(power6_generic_events), + .generic_events = power6_generic_events, +}; Index: linux-2.6-tip/arch/powerpc/kernel/ppc970-pmu.c =================================================================== --- /dev/null +++ linux-2.6-tip/arch/powerpc/kernel/ppc970-pmu.c @@ -0,0 +1,375 @@ +/* + * Performance counter support for PPC970-family processors. + * + * Copyright 2008-2009 Paul Mackerras, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include + +/* + * Bits in event code for PPC970 + */ +#define PM_PMC_SH 12 /* PMC number (1-based) for direct events */ +#define PM_PMC_MSK 0xf +#define PM_UNIT_SH 8 /* TTMMUX number and setting - unit select */ +#define PM_UNIT_MSK 0xf +#define PM_BYTE_SH 4 /* Byte number of event bus to use */ +#define PM_BYTE_MSK 3 +#define PM_PMCSEL_MSK 0xf + +/* Values in PM_UNIT field */ +#define PM_NONE 0 +#define PM_FPU 1 +#define PM_VPU 2 +#define PM_ISU 3 +#define PM_IFU 4 +#define PM_IDU 5 +#define PM_STS 6 +#define PM_LSU0 7 +#define PM_LSU1U 8 +#define PM_LSU1L 9 +#define PM_LASTUNIT 9 + +/* + * Bits in MMCR0 for PPC970 + */ +#define MMCR0_PMC1SEL_SH 8 +#define MMCR0_PMC2SEL_SH 1 +#define MMCR_PMCSEL_MSK 0x1f + +/* + * Bits in MMCR1 for PPC970 + */ +#define MMCR1_TTM0SEL_SH 62 +#define MMCR1_TTM1SEL_SH 59 +#define MMCR1_TTM3SEL_SH 53 +#define MMCR1_TTMSEL_MSK 3 +#define MMCR1_TD_CP_DBG0SEL_SH 50 +#define MMCR1_TD_CP_DBG1SEL_SH 48 +#define MMCR1_TD_CP_DBG2SEL_SH 46 +#define MMCR1_TD_CP_DBG3SEL_SH 44 +#define MMCR1_PMC1_ADDER_SEL_SH 39 +#define MMCR1_PMC2_ADDER_SEL_SH 38 +#define MMCR1_PMC6_ADDER_SEL_SH 37 +#define MMCR1_PMC5_ADDER_SEL_SH 36 +#define MMCR1_PMC8_ADDER_SEL_SH 35 +#define MMCR1_PMC7_ADDER_SEL_SH 34 +#define MMCR1_PMC3_ADDER_SEL_SH 33 +#define MMCR1_PMC4_ADDER_SEL_SH 32 +#define MMCR1_PMC3SEL_SH 27 +#define MMCR1_PMC4SEL_SH 22 +#define MMCR1_PMC5SEL_SH 17 +#define MMCR1_PMC6SEL_SH 12 +#define MMCR1_PMC7SEL_SH 7 +#define MMCR1_PMC8SEL_SH 2 + +static short mmcr1_adder_bits[8] = { + MMCR1_PMC1_ADDER_SEL_SH, + MMCR1_PMC2_ADDER_SEL_SH, + MMCR1_PMC3_ADDER_SEL_SH, + MMCR1_PMC4_ADDER_SEL_SH, + MMCR1_PMC5_ADDER_SEL_SH, + MMCR1_PMC6_ADDER_SEL_SH, + MMCR1_PMC7_ADDER_SEL_SH, + MMCR1_PMC8_ADDER_SEL_SH +}; + +/* + * Bits in MMCRA + */ + +/* + * Layout of constraint bits: + * 6666555555555544444444443333333333222222222211111111110000000000 + * 3210987654321098765432109876543210987654321098765432109876543210 + * <><>[ >[ >[ >< >< >< >< ><><><><><><><><> + * T0T1 UC PS1 PS2 B0 B1 B2 B3 P1P2P3P4P5P6P7P8 + * + * T0 - TTM0 constraint + * 46-47: TTM0SEL value (0=FPU, 2=IFU, 3=VPU) 0xC000_0000_0000 + * + * T1 - TTM1 constraint + * 44-45: TTM1SEL value (0=IDU, 3=STS) 0x3000_0000_0000 + * + * UC - unit constraint: can't have all three of FPU|IFU|VPU, ISU, IDU|STS + * 43: UC3 error 0x0800_0000_0000 + * 42: FPU|IFU|VPU events needed 0x0400_0000_0000 + * 41: ISU events needed 0x0200_0000_0000 + * 40: IDU|STS events needed 0x0100_0000_0000 + * + * PS1 + * 39: PS1 error 0x0080_0000_0000 + * 36-38: count of events needing PMC1/2/5/6 0x0070_0000_0000 + * + * PS2 + * 35: PS2 error 0x0008_0000_0000 + * 32-34: count of events needing PMC3/4/7/8 0x0007_0000_0000 + * + * B0 + * 28-31: Byte 0 event source 0xf000_0000 + * Encoding as for the event code + * + * B1, B2, B3 + * 24-27, 20-23, 16-19: Byte 1, 2, 3 event sources + * + * P1 + * 15: P1 error 0x8000 + * 14-15: Count of events needing PMC1 + * + * P2..P8 + * 0-13: Count of events needing PMC2..PMC8 + */ + +/* Masks and values for using events from the various units */ +static u64 unit_cons[PM_LASTUNIT+1][2] = { + [PM_FPU] = { 0xc80000000000ull, 0x040000000000ull }, + [PM_VPU] = { 0xc80000000000ull, 0xc40000000000ull }, + [PM_ISU] = { 0x080000000000ull, 0x020000000000ull }, + [PM_IFU] = { 0xc80000000000ull, 0x840000000000ull }, + [PM_IDU] = { 0x380000000000ull, 0x010000000000ull }, + [PM_STS] = { 0x380000000000ull, 0x310000000000ull }, +}; + +static int p970_get_constraint(unsigned int event, u64 *maskp, u64 *valp) +{ + int pmc, byte, unit, sh; + u64 mask = 0, value = 0; + int grp = -1; + + pmc = (event >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc > 8) + return -1; + sh = (pmc - 1) * 2; + mask |= 2 << sh; + value |= 1 << sh; + grp = ((pmc - 1) >> 1) & 1; + } + unit = (event >> PM_UNIT_SH) & PM_UNIT_MSK; + if (unit) { + if (unit > PM_LASTUNIT) + return -1; + mask |= unit_cons[unit][0]; + value |= unit_cons[unit][1]; + byte = (event >> PM_BYTE_SH) & PM_BYTE_MSK; + /* + * Bus events on bytes 0 and 2 can be counted + * on PMC1/2/5/6; bytes 1 and 3 on PMC3/4/7/8. + */ + if (!pmc) + grp = byte & 1; + /* Set byte lane select field */ + mask |= 0xfULL << (28 - 4 * byte); + value |= (u64)unit << (28 - 4 * byte); + } + if (grp == 0) { + /* increment PMC1/2/5/6 field */ + mask |= 0x8000000000ull; + value |= 0x1000000000ull; + } else if (grp == 1) { + /* increment PMC3/4/7/8 field */ + mask |= 0x800000000ull; + value |= 0x100000000ull; + } + *maskp = mask; + *valp = value; + return 0; +} + +static int p970_get_alternatives(unsigned int event, unsigned int alt[]) +{ + alt[0] = event; + + /* 2 alternatives for LSU empty */ + if (event == 0x2002 || event == 0x3002) { + alt[1] = event ^ 0x1000; + return 2; + } + + return 1; +} + +static int p970_compute_mmcr(unsigned int event[], int n_ev, + unsigned int hwc[], u64 mmcr[]) +{ + u64 mmcr0 = 0, mmcr1 = 0, mmcra = 0; + unsigned int pmc, unit, byte, psel; + unsigned int ttm, grp; + unsigned int pmc_inuse = 0; + unsigned int pmc_grp_use[2]; + unsigned char busbyte[4]; + unsigned char unituse[16]; + unsigned char unitmap[] = { 0, 0<<3, 3<<3, 1<<3, 2<<3, 0|4, 3|4 }; + unsigned char ttmuse[2]; + unsigned char pmcsel[8]; + int i; + + if (n_ev > 8) + return -1; + + /* First pass to count resource use */ + pmc_grp_use[0] = pmc_grp_use[1] = 0; + memset(busbyte, 0, sizeof(busbyte)); + memset(unituse, 0, sizeof(unituse)); + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + if (pmc) { + if (pmc_inuse & (1 << (pmc - 1))) + return -1; + pmc_inuse |= 1 << (pmc - 1); + /* count 1/2/5/6 vs 3/4/7/8 use */ + ++pmc_grp_use[((pmc - 1) >> 1) & 1]; + } + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + if (unit) { + if (unit > PM_LASTUNIT) + return -1; + if (!pmc) + ++pmc_grp_use[byte & 1]; + if (busbyte[byte] && busbyte[byte] != unit) + return -1; + busbyte[byte] = unit; + unituse[unit] = 1; + } + } + if (pmc_grp_use[0] > 4 || pmc_grp_use[1] > 4) + return -1; + + /* + * Assign resources and set multiplexer selects. + * + * PM_ISU can go either on TTM0 or TTM1, but that's the only + * choice we have to deal with. + */ + if (unituse[PM_ISU] & + (unituse[PM_FPU] | unituse[PM_IFU] | unituse[PM_VPU])) + unitmap[PM_ISU] = 2 | 4; /* move ISU to TTM1 */ + /* Set TTM[01]SEL fields. */ + ttmuse[0] = ttmuse[1] = 0; + for (i = PM_FPU; i <= PM_STS; ++i) { + if (!unituse[i]) + continue; + ttm = unitmap[i]; + ++ttmuse[(ttm >> 2) & 1]; + mmcr1 |= (u64)(ttm & ~4) << MMCR1_TTM1SEL_SH; + } + /* Check only one unit per TTMx */ + if (ttmuse[0] > 1 || ttmuse[1] > 1) + return -1; + + /* Set byte lane select fields and TTM3SEL. */ + for (byte = 0; byte < 4; ++byte) { + unit = busbyte[byte]; + if (!unit) + continue; + if (unit <= PM_STS) + ttm = (unitmap[unit] >> 2) & 1; + else if (unit == PM_LSU0) + ttm = 2; + else { + ttm = 3; + if (unit == PM_LSU1L && byte >= 2) + mmcr1 |= 1ull << (MMCR1_TTM3SEL_SH + 3 - byte); + } + mmcr1 |= (u64)ttm << (MMCR1_TD_CP_DBG0SEL_SH - 2 * byte); + } + + /* Second pass: assign PMCs, set PMCxSEL and PMCx_ADDER_SEL fields */ + memset(pmcsel, 0x8, sizeof(pmcsel)); /* 8 means don't count */ + for (i = 0; i < n_ev; ++i) { + pmc = (event[i] >> PM_PMC_SH) & PM_PMC_MSK; + unit = (event[i] >> PM_UNIT_SH) & PM_UNIT_MSK; + byte = (event[i] >> PM_BYTE_SH) & PM_BYTE_MSK; + psel = event[i] & PM_PMCSEL_MSK; + if (!pmc) { + /* Bus event or any-PMC direct event */ + if (unit) + psel |= 0x10 | ((byte & 2) << 2); + else + psel |= 8; + for (pmc = 0; pmc < 8; ++pmc) { + if (pmc_inuse & (1 << pmc)) + continue; + grp = (pmc >> 1) & 1; + if (unit) { + if (grp == (byte & 1)) + break; + } else if (pmc_grp_use[grp] < 4) { + ++pmc_grp_use[grp]; + break; + } + } + pmc_inuse |= 1 << pmc; + } else { + /* Direct event */ + --pmc; + if (psel == 0 && (byte & 2)) + /* add events on higher-numbered bus */ + mmcr1 |= 1ull << mmcr1_adder_bits[pmc]; + } + pmcsel[pmc] = psel; + hwc[i] = pmc; + } + for (pmc = 0; pmc < 2; ++pmc) + mmcr0 |= pmcsel[pmc] << (MMCR0_PMC1SEL_SH - 7 * pmc); + for (; pmc < 8; ++pmc) + mmcr1 |= (u64)pmcsel[pmc] << (MMCR1_PMC3SEL_SH - 5 * (pmc - 2)); + if (pmc_inuse & 1) + mmcr0 |= MMCR0_PMC1CE; + if (pmc_inuse & 0xfe) + mmcr0 |= MMCR0_PMCjCE; + + mmcra |= 0x2000; /* mark only one IOP per PPC instruction */ + + /* Return MMCRx values */ + mmcr[0] = mmcr0; + mmcr[1] = mmcr1; + mmcr[2] = mmcra; + return 0; +} + +static void p970_disable_pmc(unsigned int pmc, u64 mmcr[]) +{ + int shift, i; + + if (pmc <= 1) { + shift = MMCR0_PMC1SEL_SH - 7 * pmc; + i = 0; + } else { + shift = MMCR1_PMC3SEL_SH - 5 * (pmc - 2); + i = 1; + } + /* + * Setting the PMCxSEL field to 0x08 disables PMC x. + */ + mmcr[i] = (mmcr[i] & ~(0x1fUL << shift)) | (0x08UL << shift); +} + +static int ppc970_generic_events[] = { + [PERF_COUNT_CPU_CYCLES] = 7, + [PERF_COUNT_INSTRUCTIONS] = 1, + [PERF_COUNT_CACHE_REFERENCES] = 0x8810, /* PM_LD_REF_L1 */ + [PERF_COUNT_CACHE_MISSES] = 0x3810, /* PM_LD_MISS_L1 */ + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x431, /* PM_BR_ISSUED */ + [PERF_COUNT_BRANCH_MISSES] = 0x327, /* PM_GRP_BR_MPRED */ +}; + +struct power_pmu ppc970_pmu = { + .n_counter = 8, + .max_alternatives = 2, + .add_fields = 0x001100005555ull, + .test_adder = 0x013300000000ull, + .compute_mmcr = p970_compute_mmcr, + .get_constraint = p970_get_constraint, + .get_alternatives = p970_get_alternatives, + .disable_pmc = p970_disable_pmc, + .n_generic = ARRAY_SIZE(ppc970_generic_events), + .generic_events = ppc970_generic_events, +}; Index: linux-2.6-tip/arch/powerpc/kernel/vmlinux.lds.S =================================================================== --- linux-2.6-tip.orig/arch/powerpc/kernel/vmlinux.lds.S +++ linux-2.6-tip/arch/powerpc/kernel/vmlinux.lds.S @@ -181,13 +181,7 @@ SECTIONS __initramfs_end = .; } #endif - . = ALIGN(PAGE_SIZE); - .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { - __per_cpu_start = .; - *(.data.percpu) - *(.data.percpu.shared_aligned) - __per_cpu_end = .; - } + PERCPU(PAGE_SIZE) . = ALIGN(8); .machine.desc : AT(ADDR(.machine.desc) - LOAD_OFFSET) { Index: linux-2.6-tip/arch/powerpc/mm/fault.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/mm/fault.c +++ linux-2.6-tip/arch/powerpc/mm/fault.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -158,7 +159,7 @@ int __kprobes do_page_fault(struct pt_re } #endif /* !(CONFIG_4xx || CONFIG_BOOKE)*/ - if (in_atomic() || mm == NULL) { + if (in_atomic() || mm == NULL || current->pagefault_disabled) { if (!user_mode(regs)) return SIGSEGV; /* in_atomic() in user mode is really bad, @@ -170,6 +171,8 @@ int __kprobes do_page_fault(struct pt_re die("Weird page fault", regs, SIGSEGV); } + perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs); + /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the * kernel and should generate an OOPS. Unfortunately, in the case of an @@ -321,6 +324,7 @@ good_area: } if (ret & VM_FAULT_MAJOR) { current->maj_flt++; + perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs); #ifdef CONFIG_PPC_SMLPAR if (firmware_has_feature(FW_FEATURE_CMO)) { preempt_disable(); @@ -328,8 +332,10 @@ good_area: preempt_enable(); } #endif - } else + } else { current->min_flt++; + perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs); + } up_read(&mm->mmap_sem); return 0; Index: linux-2.6-tip/arch/powerpc/platforms/Kconfig.cputype =================================================================== --- linux-2.6-tip.orig/arch/powerpc/platforms/Kconfig.cputype +++ linux-2.6-tip/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,7 @@ config PPC64 bool "64-bit kernel" default n + select HAVE_PERF_COUNTERS help This option selects whether a 32-bit or a 64-bit kernel will be built. Index: linux-2.6-tip/arch/powerpc/platforms/cell/interrupt.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/platforms/cell/interrupt.c +++ linux-2.6-tip/arch/powerpc/platforms/cell/interrupt.c @@ -237,8 +237,6 @@ extern int noirqdebug; static void handle_iic_irq(unsigned int irq, struct irq_desc *desc) { - const unsigned int cpu = smp_processor_id(); - spin_lock(&desc->lock); desc->status &= ~(IRQ_REPLAY | IRQ_WAITING); @@ -254,7 +252,7 @@ static void handle_iic_irq(unsigned int goto out_eoi; } - kstat_cpu(cpu).irqs[irq]++; + kstat_incr_irqs_this_cpu(irq, desc); /* Mark the IRQ currently in progress.*/ desc->status |= IRQ_INPROGRESS; Index: linux-2.6-tip/arch/powerpc/platforms/cell/spufs/sched.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/platforms/cell/spufs/sched.c +++ linux-2.6-tip/arch/powerpc/platforms/cell/spufs/sched.c @@ -508,7 +508,7 @@ static void __spu_add_to_rq(struct spu_c list_add_tail(&ctx->rq, &spu_prio->runq[ctx->prio]); set_bit(ctx->prio, spu_prio->bitmap); if (!spu_prio->nr_waiting++) - __mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK); + mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK); } } Index: linux-2.6-tip/arch/powerpc/platforms/pseries/xics.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/platforms/pseries/xics.c +++ linux-2.6-tip/arch/powerpc/platforms/pseries/xics.c @@ -153,9 +153,10 @@ static int get_irq_server(unsigned int v { int server; /* For the moment only implement delivery to all cpus or one cpu */ - cpumask_t cpumask = irq_desc[virq].affinity; + cpumask_t cpumask; cpumask_t tmp = CPU_MASK_NONE; + cpumask_copy(&cpumask, irq_desc[virq].affinity); if (!distribute_irqs) return default_server; @@ -869,7 +870,7 @@ void xics_migrate_irqs_away(void) virq, cpu); /* Reset affinity to all cpus */ - irq_desc[virq].affinity = CPU_MASK_ALL; + cpumask_setall(irq_desc[virq].affinity); desc->chip->set_affinity(virq, cpu_all_mask); unlock: spin_unlock_irqrestore(&desc->lock, flags); Index: linux-2.6-tip/arch/powerpc/sysdev/mpic.c =================================================================== --- linux-2.6-tip.orig/arch/powerpc/sysdev/mpic.c +++ linux-2.6-tip/arch/powerpc/sysdev/mpic.c @@ -46,7 +46,7 @@ static struct mpic *mpics; static struct mpic *mpic_primary; -static DEFINE_SPINLOCK(mpic_lock); +static DEFINE_RAW_SPINLOCK(mpic_lock); #ifdef CONFIG_PPC32 /* XXX for now */ #ifdef CONFIG_IRQ_ALL_CPUS @@ -566,9 +566,10 @@ static void __init mpic_scan_ht_pics(str #ifdef CONFIG_SMP static int irq_choose_cpu(unsigned int virt_irq) { - cpumask_t mask = irq_desc[virt_irq].affinity; + cpumask_t mask; int cpuid; + cpumask_copy(&mask, irq_desc[virt_irq].affinity); if (cpus_equal(mask, CPU_MASK_ALL)) { static int irq_rover; static DEFINE_SPINLOCK(irq_rover_lock); Index: linux-2.6-tip/arch/s390/include/asm/smp.h =================================================================== --- linux-2.6-tip.orig/arch/s390/include/asm/smp.h +++ linux-2.6-tip/arch/s390/include/asm/smp.h @@ -97,12 +97,6 @@ extern void arch_send_call_function_ipi( #endif #ifndef CONFIG_SMP -static inline void smp_send_stop(void) -{ - /* Disable all interrupts/machine checks */ - __load_psw_mask(psw_kernel_bits & ~PSW_MASK_MCHECK); -} - #define hard_smp_processor_id() 0 #define smp_cpu_not_running(cpu) 1 #endif Index: linux-2.6-tip/arch/sh/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/sh/kernel/irq.c +++ linux-2.6-tip/arch/sh/kernel/irq.c @@ -51,7 +51,7 @@ int show_interrupts(struct seq_file *p, goto unlock; seq_printf(p, "%3d: ",i); for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); seq_printf(p, " %14s", irq_desc[i].chip->name); seq_printf(p, "-%-8s", irq_desc[i].name); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/sparc/include/asm/mmzone.h =================================================================== --- linux-2.6-tip.orig/arch/sparc/include/asm/mmzone.h +++ linux-2.6-tip/arch/sparc/include/asm/mmzone.h @@ -3,6 +3,8 @@ #ifdef CONFIG_NEED_MULTIPLE_NODES +#include + extern struct pglist_data *node_data[]; #define NODE_DATA(nid) (node_data[nid]) Index: linux-2.6-tip/arch/sparc/kernel/irq_64.c =================================================================== --- linux-2.6-tip.orig/arch/sparc/kernel/irq_64.c +++ linux-2.6-tip/arch/sparc/kernel/irq_64.c @@ -185,7 +185,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #endif seq_printf(p, " %9s", irq_desc[i].chip->typename); seq_printf(p, " %s", action->name); @@ -252,9 +252,10 @@ struct irq_handler_data { #ifdef CONFIG_SMP static int irq_choose_cpu(unsigned int virt_irq) { - cpumask_t mask = irq_desc[virt_irq].affinity; + cpumask_t mask; int cpuid; + cpumask_copy(&mask, irq_desc[virt_irq].affinity); if (cpus_equal(mask, CPU_MASK_ALL)) { static int irq_rover; static DEFINE_SPINLOCK(irq_rover_lock); @@ -805,7 +806,7 @@ void fixup_irqs(void) !(irq_desc[irq].status & IRQ_PER_CPU)) { if (irq_desc[irq].chip->set_affinity) irq_desc[irq].chip->set_affinity(irq, - &irq_desc[irq].affinity); + irq_desc[irq].affinity); } spin_unlock_irqrestore(&irq_desc[irq].lock, flags); } Index: linux-2.6-tip/arch/sparc/kernel/time_64.c =================================================================== --- linux-2.6-tip.orig/arch/sparc/kernel/time_64.c +++ linux-2.6-tip/arch/sparc/kernel/time_64.c @@ -36,10 +36,10 @@ #include #include #include +#include #include #include -#include #include #include #include @@ -729,7 +729,7 @@ void timer_interrupt(int irq, struct pt_ irq_enter(); - kstat_this_cpu.irqs[0]++; + kstat_incr_irqs_this_cpu(0, irq_to_desc(0)); if (unlikely(!evt->event_handler)) { printk(KERN_WARNING Index: linux-2.6-tip/arch/um/include/asm/ftrace.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/um/include/asm/ftrace.h @@ -0,0 +1 @@ +/* empty */ Index: linux-2.6-tip/arch/um/kernel/irq.c =================================================================== --- linux-2.6-tip.orig/arch/um/kernel/irq.c +++ linux-2.6-tip/arch/um/kernel/irq.c @@ -42,7 +42,7 @@ int show_interrupts(struct seq_file *p, seq_printf(p, "%10u ", kstat_irqs(i)); #else for_each_online_cpu(j) - seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]); + seq_printf(p, "%10u ", kstat_irqs_cpu(i, j)); #endif seq_printf(p, " %14s", irq_desc[i].chip->typename); seq_printf(p, " %s", action->name); Index: linux-2.6-tip/arch/x86/Kconfig =================================================================== --- linux-2.6-tip.orig/arch/x86/Kconfig +++ linux-2.6-tip/arch/x86/Kconfig @@ -5,7 +5,7 @@ mainmenu "Linux Kernel Configuration for config 64BIT bool "64-bit kernel" if ARCH = "x86" default ARCH = "x86_64" - help + ---help--- Say yes to build a 64-bit kernel - formerly known as x86_64 Say no to build a 32-bit kernel - formerly known as i386 @@ -34,12 +34,19 @@ config X86 select HAVE_FUNCTION_TRACER select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACE_MCOUNT_TEST - select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64) - select HAVE_ARCH_KGDB if !X86_VOYAGER + select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE + select HAVE_FTRACE_SYSCALLS + select HAVE_KVM + select HAVE_ARCH_KGDB select HAVE_ARCH_TRACEHOOK select HAVE_GENERIC_DMA_COHERENT if X86_32 select HAVE_EFFICIENT_UNALIGNED_ACCESS select USER_STACKTRACE_SUPPORT + select HAVE_KERNEL_GZIP + select HAVE_KERNEL_BZIP2 + select HAVE_KERNEL_LZMA + select HAVE_ARCH_KMEMCHECK + select HAVE_DMA_API_DEBUG config ARCH_DEFCONFIG string @@ -108,10 +115,18 @@ config ARCH_MAY_HAVE_PC_FDC def_bool y config RWSEM_GENERIC_SPINLOCK - def_bool !X86_XADD + bool + depends on !X86_XADD || PREEMPT_RT + default y + +config ASM_SEMAPHORES + bool + default y config RWSEM_XCHGADD_ALGORITHM - def_bool X86_XADD + bool + depends on X86_XADD && !RWSEM_GENERIC_SPINLOCK && !PREEMPT_RT + default y config ARCH_HAS_CPU_IDLE_WAIT def_bool y @@ -133,18 +148,19 @@ config ARCH_HAS_CACHE_LINE_SIZE def_bool y config HAVE_SETUP_PER_CPU_AREA - def_bool X86_64_SMP || (X86_SMP && !X86_VOYAGER) + def_bool y + +config HAVE_DYNAMIC_PER_CPU_AREA + def_bool y config HAVE_CPUMASK_OF_CPU_MAP def_bool X86_64_SMP config ARCH_HIBERNATION_POSSIBLE def_bool y - depends on !SMP || !X86_VOYAGER config ARCH_SUSPEND_POSSIBLE def_bool y - depends on !X86_VOYAGER config ZONE_DMA32 bool @@ -165,6 +181,9 @@ config GENERIC_HARDIRQS bool default y +config GENERIC_HARDIRQS_NO__DO_IRQ + def_bool y + config GENERIC_IRQ_PROBE bool default y @@ -174,11 +193,6 @@ config GENERIC_PENDING_IRQ depends on GENERIC_HARDIRQS && SMP default y -config X86_SMP - bool - depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64) - default y - config USE_GENERIC_SMP_HELPERS def_bool y depends on SMP @@ -194,19 +208,17 @@ config X86_64_SMP config X86_HT bool depends on SMP - depends on (X86_32 && !X86_VOYAGER) || X86_64 - default y - -config X86_BIOS_REBOOT - bool - depends on !X86_VOYAGER default y config X86_TRAMPOLINE bool - depends on X86_SMP || (X86_VOYAGER && SMP) || (64BIT && ACPI_SLEEP) + depends on SMP || (64BIT && ACPI_SLEEP) default y +config X86_32_LAZY_GS + def_bool y + depends on X86_32 && !CC_STACKPROTECTOR + config KTIME_SCALAR def_bool X86_32 source "init/Kconfig" @@ -244,14 +256,24 @@ config SMP If you don't know what to do here, say N. -config X86_HAS_BOOT_CPU_ID - def_bool y - depends on X86_VOYAGER +config X86_X2APIC + bool "Support x2apic" + depends on X86_LOCAL_APIC && X86_64 + ---help--- + This enables x2apic support on CPUs that have this feature. + + This allows 32-bit apic IDs (so it can support very large systems), + and accesses the local apic via MSRs not via mmio. + + ( On certain CPU models you may need to enable INTR_REMAP too, + to get functional x2apic mode. ) + + If you don't know what to do here, say N. config SPARSE_IRQ bool "Support sparse irq numbering" depends on PCI_MSI || HT_IRQ - help + ---help--- This enables support for sparse irqs. This is useful for distro kernels that want to define a high CONFIG_NR_CPUS value but still want to have low kernel memory footprint on smaller machines. @@ -265,114 +287,140 @@ config NUMA_MIGRATE_IRQ_DESC bool "Move irq desc when changing irq smp_affinity" depends on SPARSE_IRQ && NUMA default n - help + ---help--- This enables moving irq_desc to cpu/node that irq will use handled. If you don't know what to do here, say N. -config X86_FIND_SMP_CONFIG - def_bool y - depends on X86_MPPARSE || X86_VOYAGER - config X86_MPPARSE bool "Enable MPS table" if ACPI default y depends on X86_LOCAL_APIC - help + ---help--- For old smp systems that do not have proper acpi support. Newer systems (esp with 64bit cpus) with acpi support, MADT and DSDT will override it -choice - prompt "Subarchitecture Type" - default X86_PC +config X86_BIGSMP + bool "Support for big SMP systems with more than 8 CPUs" + depends on X86_32 && SMP + ---help--- + This option is needed for the systems that have more than 8 CPUs -config X86_PC - bool "PC-compatible" - help - Choose this option if your computer is a standard PC or compatible. +if X86_32 +config X86_EXTENDED_PLATFORM + bool "Support for extended (non-PC) x86 platforms" + default y + ---help--- + If you disable this option then the kernel will only support + standard PC platforms. (which covers the vast majority of + systems out there.) + + If you enable this option then you'll be able to select support + for the following (non-PC) 32 bit x86 platforms: + AMD Elan + NUMAQ (IBM/Sequent) + RDC R-321x SoC + SGI 320/540 (Visual Workstation) + Summit/EXA (IBM x440) + Unisys ES7000 IA32 series + + If you have one of these systems, or if you want to build a + generic distribution kernel, say Y here - otherwise say N. +endif + +if X86_64 +config X86_EXTENDED_PLATFORM + bool "Support for extended (non-PC) x86 platforms" + default y + ---help--- + If you disable this option then the kernel will only support + standard PC platforms. (which covers the vast majority of + systems out there.) + + If you enable this option then you'll be able to select support + for the following (non-PC) 64 bit x86 platforms: + ScaleMP vSMP + SGI Ultraviolet + + If you have one of these systems, or if you want to build a + generic distribution kernel, say Y here - otherwise say N. +endif +# This is an alphabetically sorted list of 64 bit extended platforms +# Please maintain the alphabetic order if and when there are additions + +config X86_VSMP + bool "ScaleMP vSMP" + select PARAVIRT + depends on X86_64 && PCI + depends on X86_EXTENDED_PLATFORM + ---help--- + Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is + supposed to run on these EM64T-based machines. Only choose this option + if you have one of these machines. + +config X86_UV + bool "SGI Ultraviolet" + depends on X86_64 + depends on X86_EXTENDED_PLATFORM + select X86_X2APIC + ---help--- + This option is needed in order to support SGI Ultraviolet systems. + If you don't have one of these, you should say N here. + +# Following is an alphabetically sorted list of 32 bit extended platforms +# Please maintain the alphabetic order if and when there are additions config X86_ELAN bool "AMD Elan" depends on X86_32 - help + depends on X86_EXTENDED_PLATFORM + ---help--- Select this for an AMD Elan processor. Do not use this option for K6/Athlon/Opteron processors! If unsure, choose "PC-compatible" instead. -config X86_VOYAGER - bool "Voyager (NCR)" - depends on X86_32 && (SMP || BROKEN) && !PCI - help - Voyager is an MCA-based 32-way capable SMP architecture proprietary - to NCR Corp. Machine classes 345x/35xx/4100/51xx are Voyager-based. - - *** WARNING *** - - If you do not specifically know you have a Voyager based machine, - say N here, otherwise the kernel you build will not be bootable. - -config X86_GENERICARCH - bool "Generic architecture" +config X86_RDC321X + bool "RDC R-321x SoC" depends on X86_32 - help - This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default + depends on X86_EXTENDED_PLATFORM + select M486 + select X86_REBOOTFIXUPS + ---help--- + This option is needed for RDC R-321x system-on-chip, also known + as R-8610-(G). + If you don't have one of these chips, you should say N here. + +config X86_32_NON_STANDARD + bool "Support non-standard 32-bit SMP architectures" + depends on X86_32 && SMP + depends on X86_EXTENDED_PLATFORM + ---help--- + This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default subarchitectures. It is intended for a generic binary kernel. if you select them all, kernel will probe it one by one. and will fallback to default. -if X86_GENERICARCH +# Alphabetically sorted list of Non standard 32 bit platforms config X86_NUMAQ bool "NUMAQ (IBM/Sequent)" - depends on SMP && X86_32 && PCI && X86_MPPARSE + depends on X86_32_NON_STANDARD select NUMA - help + select X86_MPPARSE + ---help--- This option is used for getting Linux to run on a NUMAQ (IBM/Sequent) NUMA multiquad box. This changes the way that processors are bootstrapped, and uses Clustered Logical APIC addressing mode instead of Flat Logical. You will need a new lynxer.elf file to flash your firmware with - send email to . -config X86_SUMMIT - bool "Summit/EXA (IBM x440)" - depends on X86_32 && SMP - help - This option is needed for IBM systems that use the Summit/EXA chipset. - In particular, it is needed for the x440. - -config X86_ES7000 - bool "Support for Unisys ES7000 IA32 series" - depends on X86_32 && SMP - help - Support for Unisys ES7000 systems. Say 'Y' here if this kernel is - supposed to run on an IA32-based Unisys ES7000 system. - -config X86_BIGSMP - bool "Support for big SMP systems with more than 8 CPUs" - depends on X86_32 && SMP - help - This option is needed for the systems that have more than 8 CPUs - and if the system is not of any sub-arch type above. - -endif - -config X86_VSMP - bool "Support for ScaleMP vSMP" - select PARAVIRT - depends on X86_64 && PCI - help - Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is - supposed to run on these EM64T-based machines. Only choose this option - if you have one of these machines. - -endchoice - config X86_VISWS bool "SGI 320/540 (Visual Workstation)" - depends on X86_32 && PCI && !X86_VOYAGER && X86_MPPARSE && PCI_GODIRECT - help + depends on X86_32 && PCI && X86_MPPARSE && PCI_GODIRECT + depends on X86_32_NON_STANDARD + ---help--- The SGI Visual Workstation series is an IA32-based workstation based on SGI systems chips with some legacy PC hardware attached. @@ -381,21 +429,25 @@ config X86_VISWS A kernel compiled for the Visual Workstation will run on general PCs as well. See for details. -config X86_RDC321X - bool "RDC R-321x SoC" - depends on X86_32 - select M486 - select X86_REBOOTFIXUPS - help - This option is needed for RDC R-321x system-on-chip, also known - as R-8610-(G). - If you don't have one of these chips, you should say N here. +config X86_SUMMIT + bool "Summit/EXA (IBM x440)" + depends on X86_32_NON_STANDARD + ---help--- + This option is needed for IBM systems that use the Summit/EXA chipset. + In particular, it is needed for the x440. + +config X86_ES7000 + bool "Unisys ES7000 IA32 series" + depends on X86_32_NON_STANDARD && X86_BIGSMP + ---help--- + Support for Unisys ES7000 systems. Say 'Y' here if this kernel is + supposed to run on an IA32-based Unisys ES7000 system. config SCHED_OMIT_FRAME_POINTER def_bool y prompt "Single-depth WCHAN output" depends on X86 - help + ---help--- Calculate simpler /proc//wchan values. If this option is disabled then wchan values will recurse back to the caller function. This provides more accurate wchan values, @@ -405,7 +457,7 @@ config SCHED_OMIT_FRAME_POINTER menuconfig PARAVIRT_GUEST bool "Paravirtualized guest support" - help + ---help--- Say Y here to get to see options related to running Linux under various hypervisors. This option alone does not add any kernel code. @@ -419,8 +471,7 @@ config VMI bool "VMI Guest support" select PARAVIRT depends on X86_32 - depends on !X86_VOYAGER - help + ---help--- VMI provides a paravirtualized interface to the VMware ESX server (it could be used by other hypervisors in theory too, but is not at the moment), by linking the kernel to a GPL-ed ROM module @@ -430,8 +481,7 @@ config KVM_CLOCK bool "KVM paravirtualized clock" select PARAVIRT select PARAVIRT_CLOCK - depends on !X86_VOYAGER - help + ---help--- Turning on this option will allow you to run a paravirtualized clock when running over the KVM hypervisor. Instead of relying on a PIT (or probably other) emulation by the underlying device model, the host @@ -441,17 +491,15 @@ config KVM_CLOCK config KVM_GUEST bool "KVM Guest support" select PARAVIRT - depends on !X86_VOYAGER - help - This option enables various optimizations for running under the KVM - hypervisor. + ---help--- + This option enables various optimizations for running under the KVM + hypervisor. source "arch/x86/lguest/Kconfig" config PARAVIRT bool "Enable paravirtualization code" - depends on !X86_VOYAGER - help + ---help--- This changes the kernel so it can modify itself when it is run under a hypervisor, potentially improving performance significantly over full virtualization. However, when run without a hypervisor @@ -464,51 +512,51 @@ config PARAVIRT_CLOCK endif config PARAVIRT_DEBUG - bool "paravirt-ops debugging" - depends on PARAVIRT && DEBUG_KERNEL - help - Enable to debug paravirt_ops internals. Specifically, BUG if - a paravirt_op is missing when it is called. + bool "paravirt-ops debugging" + depends on PARAVIRT && DEBUG_KERNEL + ---help--- + Enable to debug paravirt_ops internals. Specifically, BUG if + a paravirt_op is missing when it is called. config MEMTEST bool "Memtest" - help + ---help--- This option adds a kernel parameter 'memtest', which allows memtest to be set. - memtest=0, mean disabled; -- default - memtest=1, mean do 1 test pattern; - ... - memtest=4, mean do 4 test patterns. + memtest=0, mean disabled; -- default + memtest=1, mean do 1 test pattern; + ... + memtest=4, mean do 4 test patterns. If you are unsure how to answer this question, answer N. config X86_SUMMIT_NUMA def_bool y - depends on X86_32 && NUMA && X86_GENERICARCH + depends on X86_32 && NUMA && X86_32_NON_STANDARD config X86_CYCLONE_TIMER def_bool y - depends on X86_GENERICARCH + depends on X86_32_NON_STANDARD source "arch/x86/Kconfig.cpu" config HPET_TIMER def_bool X86_64 prompt "HPET Timer Support" if X86_32 - help - Use the IA-PC HPET (High Precision Event Timer) to manage - time in preference to the PIT and RTC, if a HPET is - present. - HPET is the next generation timer replacing legacy 8254s. - The HPET provides a stable time base on SMP - systems, unlike the TSC, but it is more expensive to access, - as it is off-chip. You can find the HPET spec at - . - - You can safely choose Y here. However, HPET will only be - activated if the platform and the BIOS support this feature. - Otherwise the 8254 will be used for timing services. + ---help--- + Use the IA-PC HPET (High Precision Event Timer) to manage + time in preference to the PIT and RTC, if a HPET is + present. + HPET is the next generation timer replacing legacy 8254s. + The HPET provides a stable time base on SMP + systems, unlike the TSC, but it is more expensive to access, + as it is off-chip. You can find the HPET spec at + . + + You can safely choose Y here. However, HPET will only be + activated if the platform and the BIOS support this feature. + Otherwise the 8254 will be used for timing services. - Choose N to continue using the legacy 8254 timer. + Choose N to continue using the legacy 8254 timer. config HPET_EMULATE_RTC def_bool y @@ -519,7 +567,7 @@ config HPET_EMULATE_RTC config DMI default y bool "Enable DMI scanning" if EMBEDDED - help + ---help--- Enabled scanning of DMI to identify machine quirks. Say Y here unless you have verified that your setup is not affected by entries in the DMI blacklist. Required by PNP @@ -531,7 +579,7 @@ config GART_IOMMU select SWIOTLB select AGP depends on X86_64 && PCI - help + ---help--- Support for full DMA access of devices with 32bit memory access only on systems with more than 3GB. This is usually needed for USB, sound, many IDE/SATA chipsets and some other devices. @@ -546,7 +594,7 @@ config CALGARY_IOMMU bool "IBM Calgary IOMMU support" select SWIOTLB depends on X86_64 && PCI && EXPERIMENTAL - help + ---help--- Support for hardware IOMMUs in IBM's xSeries x366 and x460 systems. Needed to run systems with more than 3GB of memory properly with 32-bit PCI devices that do not support DAC @@ -564,7 +612,7 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT def_bool y prompt "Should Calgary be enabled by default?" depends on CALGARY_IOMMU - help + ---help--- Should Calgary be enabled by default? if you choose 'y', Calgary will be used (if it exists). If you choose 'n', Calgary will not be used even if it exists. If you choose 'n' and would like to use @@ -576,7 +624,7 @@ config AMD_IOMMU select SWIOTLB select PCI_MSI depends on X86_64 && PCI && ACPI - help + ---help--- With this option you can enable support for AMD IOMMU hardware in your system. An IOMMU is a hardware component which provides remapping of DMA memory accesses from devices. With an AMD IOMMU you @@ -591,7 +639,7 @@ config AMD_IOMMU_STATS bool "Export AMD IOMMU statistics to debugfs" depends on AMD_IOMMU select DEBUG_FS - help + ---help--- This option enables code in the AMD IOMMU driver to collect various statistics about whats happening in the driver and exports that information to userspace via debugfs. @@ -600,7 +648,7 @@ config AMD_IOMMU_STATS # need this always selected by IOMMU for the VIA workaround config SWIOTLB def_bool y if X86_64 - help + ---help--- Support for software bounce buffers used on x86-64 systems which don't have a hardware IOMMU (e.g. the current generation of Intel's x86-64 CPUs). Using this PCI devices which can only @@ -615,10 +663,10 @@ config IOMMU_API config MAXSMP bool "Configure Maximum number of SMP Processors and NUMA Nodes" - depends on X86_64 && SMP && DEBUG_KERNEL && EXPERIMENTAL + depends on 0 && X86_64 && SMP && DEBUG_KERNEL && EXPERIMENTAL select CPUMASK_OFFSTACK default n - help + ---help--- Configure maximum number of CPUS and NUMA Nodes for this architecture. If unsure, say N. @@ -629,7 +677,7 @@ config NR_CPUS default "4096" if MAXSMP default "32" if SMP && (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000) default "8" if SMP - help + ---help--- This allows you to specify the maximum number of CPUs which this kernel will support. The maximum supported value is 512 and the minimum value which makes sense is 2. @@ -640,7 +688,7 @@ config NR_CPUS config SCHED_SMT bool "SMT (Hyperthreading) scheduler support" depends on X86_HT - help + ---help--- SMT scheduler support improves the CPU scheduler's decision making when dealing with Intel Pentium 4 chips with HyperThreading at a cost of slightly increased overhead in some places. If unsure say @@ -650,7 +698,7 @@ config SCHED_MC def_bool y prompt "Multi-core scheduler support" depends on X86_HT - help + ---help--- Multi-core scheduler support improves the CPU scheduler's decision making when dealing with multi-core CPU chips at a cost of slightly increased overhead in some places. If unsure say N here. @@ -659,8 +707,8 @@ source "kernel/Kconfig.preempt" config X86_UP_APIC bool "Local APIC support on uniprocessors" - depends on X86_32 && !SMP && !(X86_VOYAGER || X86_GENERICARCH) - help + depends on X86_32 && !SMP && !X86_32_NON_STANDARD + ---help--- A local APIC (Advanced Programmable Interrupt Controller) is an integrated interrupt controller in the CPU. If you have a single-CPU system which has a processor with a local APIC, you can say Y here to @@ -673,7 +721,7 @@ config X86_UP_APIC config X86_UP_IOAPIC bool "IO-APIC support on uniprocessors" depends on X86_UP_APIC - help + ---help--- An IO-APIC (I/O Advanced Programmable Interrupt Controller) is an SMP-capable replacement for PC-style interrupt controllers. Most SMP systems and many recent uniprocessor systems have one. @@ -684,11 +732,12 @@ config X86_UP_IOAPIC config X86_LOCAL_APIC def_bool y - depends on X86_64 || (X86_32 && (X86_UP_APIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH)) + depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC + select HAVE_PERF_COUNTERS if (!M386 && !M486) config X86_IO_APIC def_bool y - depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH)) + depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC config X86_VISWS_APIC def_bool y @@ -698,7 +747,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS bool "Reroute for broken boot IRQs" default n depends on X86_IO_APIC - help + ---help--- This option enables a workaround that fixes a source of spurious interrupts. This is recommended when threaded interrupt handling is used on systems where the generation of @@ -720,7 +769,6 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS config X86_MCE bool "Machine Check Exception" - depends on !X86_VOYAGER ---help--- Machine Check Exception support allows the processor to notify the kernel if it detects a problem (e.g. overheating, component failure). @@ -739,7 +787,7 @@ config X86_MCE_INTEL def_bool y prompt "Intel MCE features" depends on X86_64 && X86_MCE && X86_LOCAL_APIC - help + ---help--- Additional support for intel specific MCE features such as the thermal monitor. @@ -747,14 +795,19 @@ config X86_MCE_AMD def_bool y prompt "AMD MCE features" depends on X86_64 && X86_MCE && X86_LOCAL_APIC - help + ---help--- Additional support for AMD specific MCE features such as the DRAM Error Threshold. +config X86_MCE_THRESHOLD + depends on X86_MCE_AMD || X86_MCE_INTEL + bool + default y + config X86_MCE_NONFATAL tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4" depends on X86_32 && X86_MCE - help + ---help--- Enabling this feature starts a timer that triggers every 5 seconds which will look at the machine check registers to see if anything happened. Non-fatal problems automatically get corrected (but still logged). @@ -767,7 +820,7 @@ config X86_MCE_NONFATAL config X86_MCE_P4THERMAL bool "check for P4 thermal throttling interrupt." depends on X86_32 && X86_MCE && (X86_UP_APIC || SMP) - help + ---help--- Enabling this feature will cause a message to be printed when the P4 enters thermal throttling. @@ -775,11 +828,11 @@ config VM86 bool "Enable VM86 support" if EMBEDDED default y depends on X86_32 - help - This option is required by programs like DOSEMU to run 16-bit legacy + ---help--- + This option is required by programs like DOSEMU to run 16-bit legacy code on X86 processors. It also may be needed by software like - XFree86 to initialize some video cards via BIOS. Disabling this - option saves about 6k. + XFree86 to initialize some video cards via BIOS. Disabling this + option saves about 6k. config TOSHIBA tristate "Toshiba Laptop support" @@ -853,33 +906,33 @@ config MICROCODE module will be called microcode. config MICROCODE_INTEL - bool "Intel microcode patch loading support" - depends on MICROCODE - default MICROCODE - select FW_LOADER - --help--- - This options enables microcode patch loading support for Intel - processors. - - For latest news and information on obtaining all the required - Intel ingredients for this driver, check: - . + bool "Intel microcode patch loading support" + depends on MICROCODE + default MICROCODE + select FW_LOADER + ---help--- + This options enables microcode patch loading support for Intel + processors. + + For latest news and information on obtaining all the required + Intel ingredients for this driver, check: + . config MICROCODE_AMD - bool "AMD microcode patch loading support" - depends on MICROCODE - select FW_LOADER - --help--- - If you select this option, microcode patch loading support for AMD - processors will be enabled. + bool "AMD microcode patch loading support" + depends on MICROCODE + select FW_LOADER + ---help--- + If you select this option, microcode patch loading support for AMD + processors will be enabled. - config MICROCODE_OLD_INTERFACE +config MICROCODE_OLD_INTERFACE def_bool y depends on MICROCODE config X86_MSR tristate "/dev/cpu/*/msr - Model-specific register support" - help + ---help--- This device gives privileged processes access to the x86 Model-Specific Registers (MSRs). It is a character device with major 202 and minors 0 to 31 for /dev/cpu/0/msr to /dev/cpu/31/msr. @@ -888,12 +941,18 @@ config X86_MSR config X86_CPUID tristate "/dev/cpu/*/cpuid - CPU information support" - help + ---help--- This device gives processes access to the x86 CPUID instruction to be executed on a specific processor. It is a character device with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to /dev/cpu/31/cpuid. +config X86_CPU_DEBUG + tristate "/sys/kernel/debug/x86/cpu/* - CPU Debug support" + ---help--- + If you select this option, this will provide various x86 CPUs + information through debugfs. + choice prompt "High Memory Support" default HIGHMEM4G if !X86_NUMAQ @@ -940,7 +999,7 @@ config NOHIGHMEM config HIGHMEM4G bool "4GB" depends on !X86_NUMAQ - help + ---help--- Select this if you have a 32-bit processor and between 1 and 4 gigabytes of physical RAM. @@ -948,7 +1007,7 @@ config HIGHMEM64G bool "64GB" depends on !M386 && !M486 select X86_PAE - help + ---help--- Select this if you have a 32-bit processor and more than 4 gigabytes of physical RAM. @@ -959,7 +1018,7 @@ choice prompt "Memory split" if EMBEDDED default VMSPLIT_3G depends on X86_32 - help + ---help--- Select the desired split between kernel and user memory. If the address range available to the kernel is less than the @@ -1005,20 +1064,20 @@ config HIGHMEM config X86_PAE bool "PAE (Physical Address Extension) Support" depends on X86_32 && !HIGHMEM4G - help + ---help--- PAE is required for NX support, and furthermore enables larger swapspace support for non-overcommit purposes. It has the cost of more pagetable lookup overhead, and also consumes more pagetable space per process. config ARCH_PHYS_ADDR_T_64BIT - def_bool X86_64 || X86_PAE + def_bool X86_64 || X86_PAE config DIRECT_GBPAGES bool "Enable 1GB pages for kernel pagetables" if EMBEDDED default y depends on X86_64 - help + ---help--- Allow the kernel linear mapping to use 1GB pages on CPUs that support it. This can improve the kernel's performance a tiny bit by reducing TLB pressure. If in doubt, say "Y". @@ -1028,9 +1087,8 @@ config NUMA bool "Numa Memory Allocation and Scheduler Support" depends on SMP depends on X86_64 || (X86_32 && HIGHMEM64G && (X86_NUMAQ || X86_BIGSMP || X86_SUMMIT && ACPI) && EXPERIMENTAL) - default n if X86_PC default y if (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP) - help + ---help--- Enable NUMA (Non Uniform Memory Access) support. The kernel will try to allocate memory used by a CPU on the @@ -1053,19 +1111,19 @@ config K8_NUMA def_bool y prompt "Old style AMD Opteron NUMA detection" depends on X86_64 && NUMA && PCI - help - Enable K8 NUMA node topology detection. You should say Y here if - you have a multi processor AMD K8 system. This uses an old - method to read the NUMA configuration directly from the builtin - Northbridge of Opteron. It is recommended to use X86_64_ACPI_NUMA - instead, which also takes priority if both are compiled in. + ---help--- + Enable K8 NUMA node topology detection. You should say Y here if + you have a multi processor AMD K8 system. This uses an old + method to read the NUMA configuration directly from the builtin + Northbridge of Opteron. It is recommended to use X86_64_ACPI_NUMA + instead, which also takes priority if both are compiled in. config X86_64_ACPI_NUMA def_bool y prompt "ACPI NUMA detection" depends on X86_64 && NUMA && ACPI && PCI select ACPI_NUMA - help + ---help--- Enable ACPI SRAT based node topology detection. # Some NUMA nodes have memory ranges that span @@ -1080,24 +1138,24 @@ config NODES_SPAN_OTHER_NODES config NUMA_EMU bool "NUMA emulation" depends on X86_64 && NUMA - help + ---help--- Enable NUMA emulation. A flat machine will be split into virtual nodes when booted with "numa=fake=N", where N is the number of nodes. This is only useful for debugging. config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP - range 1 9 if X86_64 + range 1 9 default "9" if MAXSMP default "6" if X86_64 default "4" if X86_NUMAQ default "3" depends on NEED_MULTIPLE_NODES - help + ---help--- Specify the maximum number of NUMA Nodes available on the target system. Increases memory reserved to accomodate various tables. -config HAVE_ARCH_BOOTMEM_NODE +config HAVE_ARCH_BOOTMEM def_bool y depends on X86_32 && NUMA @@ -1131,7 +1189,7 @@ config ARCH_SPARSEMEM_DEFAULT config ARCH_SPARSEMEM_ENABLE def_bool y - depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC) || X86_GENERICARCH + depends on X86_64 || NUMA || (EXPERIMENTAL && X86_32) || X86_32_NON_STANDARD select SPARSEMEM_STATIC if X86_32 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 @@ -1143,66 +1201,71 @@ config ARCH_MEMORY_PROBE def_bool X86_64 depends on MEMORY_HOTPLUG +config ILLEGAL_POINTER_VALUE + hex + default 0 if X86_32 + default 0xdead000000000000 if X86_64 + source "mm/Kconfig" config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on X86_32 && (HIGHMEM4G || HIGHMEM64G) - help + ---help--- The VM uses one page table entry for each page of physical memory. For systems with a lot of RAM, this can be wasteful of precious low memory. Setting this option will put user-space page table entries in high memory. config X86_CHECK_BIOS_CORRUPTION - bool "Check for low memory corruption" - help - Periodically check for memory corruption in low memory, which - is suspected to be caused by BIOS. Even when enabled in the - configuration, it is disabled at runtime. Enable it by - setting "memory_corruption_check=1" on the kernel command - line. By default it scans the low 64k of memory every 60 - seconds; see the memory_corruption_check_size and - memory_corruption_check_period parameters in - Documentation/kernel-parameters.txt to adjust this. - - When enabled with the default parameters, this option has - almost no overhead, as it reserves a relatively small amount - of memory and scans it infrequently. It both detects corruption - and prevents it from affecting the running system. - - It is, however, intended as a diagnostic tool; if repeatable - BIOS-originated corruption always affects the same memory, - you can use memmap= to prevent the kernel from using that - memory. + bool "Check for low memory corruption" + ---help--- + Periodically check for memory corruption in low memory, which + is suspected to be caused by BIOS. Even when enabled in the + configuration, it is disabled at runtime. Enable it by + setting "memory_corruption_check=1" on the kernel command + line. By default it scans the low 64k of memory every 60 + seconds; see the memory_corruption_check_size and + memory_corruption_check_period parameters in + Documentation/kernel-parameters.txt to adjust this. + + When enabled with the default parameters, this option has + almost no overhead, as it reserves a relatively small amount + of memory and scans it infrequently. It both detects corruption + and prevents it from affecting the running system. + + It is, however, intended as a diagnostic tool; if repeatable + BIOS-originated corruption always affects the same memory, + you can use memmap= to prevent the kernel from using that + memory. config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK - bool "Set the default setting of memory_corruption_check" + bool "Set the default setting of memory_corruption_check" depends on X86_CHECK_BIOS_CORRUPTION default y - help - Set whether the default state of memory_corruption_check is - on or off. + ---help--- + Set whether the default state of memory_corruption_check is + on or off. config X86_RESERVE_LOW_64K - bool "Reserve low 64K of RAM on AMI/Phoenix BIOSen" + bool "Reserve low 64K of RAM on AMI/Phoenix BIOSen" default y - help - Reserve the first 64K of physical RAM on BIOSes that are known - to potentially corrupt that memory range. A numbers of BIOSes are - known to utilize this area during suspend/resume, so it must not - be used by the kernel. - - Set this to N if you are absolutely sure that you trust the BIOS - to get all its memory reservations and usages right. - - If you have doubts about the BIOS (e.g. suspend/resume does not - work or there's kernel crashes after certain hardware hotplug - events) and it's not AMI or Phoenix, then you might want to enable - X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical - corruption patterns. + ---help--- + Reserve the first 64K of physical RAM on BIOSes that are known + to potentially corrupt that memory range. A numbers of BIOSes are + known to utilize this area during suspend/resume, so it must not + be used by the kernel. + + Set this to N if you are absolutely sure that you trust the BIOS + to get all its memory reservations and usages right. + + If you have doubts about the BIOS (e.g. suspend/resume does not + work or there's kernel crashes after certain hardware hotplug + events) and it's not AMI or Phoenix, then you might want to enable + X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical + corruption patterns. - Say Y if unsure. + Say Y if unsure. config MATH_EMULATION bool @@ -1268,7 +1331,7 @@ config MTRR_SANITIZER def_bool y prompt "MTRR cleanup support" depends on MTRR - help + ---help--- Convert MTRR layout from continuous to discrete, so X drivers can add writeback entries. @@ -1283,7 +1346,7 @@ config MTRR_SANITIZER_ENABLE_DEFAULT range 0 1 default "0" depends on MTRR_SANITIZER - help + ---help--- Enable mtrr cleanup default value config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT @@ -1291,7 +1354,7 @@ config MTRR_SANITIZER_SPARE_REG_NR_DEFAU range 0 7 default "1" depends on MTRR_SANITIZER - help + ---help--- mtrr cleanup spare entries default, it can be changed via mtrr_spare_reg_nr=N on the kernel command line. @@ -1299,7 +1362,7 @@ config X86_PAT bool prompt "x86 PAT support" depends on MTRR - help + ---help--- Use PAT attributes to setup page level cache control. PATs are the modern equivalents of MTRRs and are much more @@ -1314,20 +1377,20 @@ config EFI bool "EFI runtime service support" depends on ACPI ---help--- - This enables the kernel to use EFI runtime services that are - available (such as the EFI variable services). + This enables the kernel to use EFI runtime services that are + available (such as the EFI variable services). - This option is only useful on systems that have EFI firmware. - In addition, you should use the latest ELILO loader available - at in order to take advantage - of EFI runtime services. However, even with this option, the - resultant kernel should continue to boot on existing non-EFI - platforms. + This option is only useful on systems that have EFI firmware. + In addition, you should use the latest ELILO loader available + at in order to take advantage + of EFI runtime services. However, even with this option, the + resultant kernel should continue to boot on existing non-EFI + platforms. config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" - help + ---help--- This kernel feature is useful for number crunching applications that may need to compute untrusted bytecode during their execution. By using pipes or other transports made available to @@ -1340,13 +1403,16 @@ config SECCOMP If unsure, say Y. Only embedded should say N here. +config CC_STACKPROTECTOR_ALL + bool + config CC_STACKPROTECTOR bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)" - depends on X86_64 && EXPERIMENTAL && BROKEN - help - This option turns on the -fstack-protector GCC feature. This - feature puts, at the beginning of critical functions, a canary - value on the stack just before the return address, and validates + select CC_STACKPROTECTOR_ALL + ---help--- + This option turns on the -fstack-protector GCC feature. This + feature puts, at the beginning of functions, a canary value on + the stack just before the return address, and validates the value just before actually returning. Stack based buffer overflows (that need to overwrite this return address) now also overwrite the canary, which gets detected and the attack is then @@ -1354,22 +1420,14 @@ config CC_STACKPROTECTOR This feature requires gcc version 4.2 or above, or a distribution gcc with the feature backported. Older versions are automatically - detected and for those versions, this configuration option is ignored. - -config CC_STACKPROTECTOR_ALL - bool "Use stack-protector for all functions" - depends on CC_STACKPROTECTOR - help - Normally, GCC only inserts the canary value protection for - functions that use large-ish on-stack buffers. By enabling - this option, GCC will be asked to do this for ALL functions. + detected and for those versions, this configuration option is + ignored. (and a warning is printed during bootup) source kernel/Kconfig.hz config KEXEC bool "kexec system call" - depends on X86_BIOS_REBOOT - help + ---help--- kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot but it is independent of the system firmware. And like a reboot @@ -1386,7 +1444,7 @@ config KEXEC config CRASH_DUMP bool "kernel crash dumps" depends on X86_64 || (X86_32 && HIGHMEM) - help + ---help--- Generate crash dump after being started by kexec. This should be normally only set in special crash dump kernels which are loaded in the main kernel with kexec-tools into @@ -1400,8 +1458,8 @@ config CRASH_DUMP config KEXEC_JUMP bool "kexec jump (EXPERIMENTAL)" depends on EXPERIMENTAL - depends on KEXEC && HIBERNATION && X86_32 - help + depends on KEXEC && HIBERNATION + ---help--- Jump between original kernel and kexeced kernel and invoke code in physical address mode via KEXEC @@ -1410,7 +1468,7 @@ config PHYSICAL_START default "0x1000000" if X86_NUMAQ default "0x200000" if X86_64 default "0x100000" - help + ---help--- This gives the physical address where the kernel is loaded. If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then @@ -1451,7 +1509,7 @@ config PHYSICAL_START config RELOCATABLE bool "Build a relocatable kernel (EXPERIMENTAL)" depends on EXPERIMENTAL - help + ---help--- This builds a kernel image that retains relocation information so it can be loaded someplace besides the default 1MB. The relocations tend to make the kernel binary about 10% larger, @@ -1471,7 +1529,7 @@ config PHYSICAL_ALIGN default "0x100000" if X86_32 default "0x200000" if X86_64 range 0x2000 0x400000 - help + ---help--- This value puts the alignment restrictions on physical address where kernel is loaded and run from. Kernel is compiled for an address which meets above alignment restriction. @@ -1492,7 +1550,7 @@ config PHYSICAL_ALIGN config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" - depends on SMP && HOTPLUG && !X86_VOYAGER + depends on SMP && HOTPLUG ---help--- Say Y here to allow turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. @@ -1504,7 +1562,7 @@ config COMPAT_VDSO def_bool y prompt "Compat VDSO support" depends on X86_32 || IA32_EMULATION - help + ---help--- Map the 32-bit VDSO to the predictable old-style address too. ---help--- Say N here if you are running a sufficiently recent glibc @@ -1516,7 +1574,7 @@ config COMPAT_VDSO config CMDLINE_BOOL bool "Built-in kernel command line" default n - help + ---help--- Allow for specifying boot arguments to the kernel at build time. On some systems (e.g. embedded ones), it is necessary or convenient to provide some or all of the @@ -1534,7 +1592,7 @@ config CMDLINE string "Built-in kernel command string" depends on CMDLINE_BOOL default "" - help + ---help--- Enter arguments here that should be compiled into the kernel image and used at boot time. If the boot loader provides a command line at boot time, it is appended to this string to @@ -1551,7 +1609,7 @@ config CMDLINE_OVERRIDE bool "Built-in command line overrides boot loader arguments" default n depends on CMDLINE_BOOL - help + ---help--- Set this option to 'Y' to have the kernel ignore the boot loader command line, and use ONLY the built-in command line. @@ -1572,8 +1630,11 @@ config HAVE_ARCH_EARLY_PFN_TO_NID def_bool X86_64 depends on NUMA +config HARDIRQS_SW_RESEND + bool + default y + menu "Power management and ACPI options" - depends on !X86_VOYAGER config ARCH_HIBERNATION_HEADER def_bool y @@ -1651,7 +1712,7 @@ if APM config APM_IGNORE_USER_SUSPEND bool "Ignore USER SUSPEND" - help + ---help--- This option will ignore USER SUSPEND requests. On machines with a compliant APM BIOS, you want to say N. However, on the NEC Versa M series notebooks, it is necessary to say Y because of a BIOS bug. @@ -1675,7 +1736,7 @@ config APM_DO_ENABLE config APM_CPU_IDLE bool "Make CPU Idle calls when idle" - help + ---help--- Enable calls to APM CPU Idle/CPU Busy inside the kernel's idle loop. On some machines, this can activate improved power savings, such as a slowed CPU clock rate, when the machine is idle. These idle calls @@ -1686,7 +1747,7 @@ config APM_CPU_IDLE config APM_DISPLAY_BLANK bool "Enable console blanking using APM" - help + ---help--- Enable console blanking using the APM. Some laptops can use this to turn off the LCD backlight when the screen blanker of the Linux virtual console blanks the screen. Note that this is only used by @@ -1699,7 +1760,7 @@ config APM_DISPLAY_BLANK config APM_ALLOW_INTS bool "Allow interrupts during APM BIOS calls" - help + ---help--- Normally we disable external interrupts while we are making calls to the APM BIOS as a measure to lessen the effects of a badly behaving BIOS implementation. The BIOS should reenable interrupts if it @@ -1724,7 +1785,7 @@ config PCI bool "PCI support" default y select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC) - help + ---help--- Find out whether you have a PCI motherboard. PCI is the name of a bus system, i.e. the way the CPU talks to the other stuff inside your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or @@ -1795,7 +1856,7 @@ config PCI_MMCONFIG config DMAR bool "Support for DMA Remapping Devices (EXPERIMENTAL)" depends on X86_64 && PCI_MSI && ACPI && EXPERIMENTAL - help + ---help--- DMA remapping (DMAR) devices support enables independent address translations for Direct Memory Access (DMA) from devices. These DMA remapping devices are reported via ACPI tables @@ -1817,29 +1878,30 @@ config DMAR_GFX_WA def_bool y prompt "Support for Graphics workaround" depends on DMAR - help - Current Graphics drivers tend to use physical address - for DMA and avoid using DMA APIs. Setting this config - option permits the IOMMU driver to set a unity map for - all the OS-visible memory. Hence the driver can continue - to use physical addresses for DMA. + ---help--- + Current Graphics drivers tend to use physical address + for DMA and avoid using DMA APIs. Setting this config + option permits the IOMMU driver to set a unity map for + all the OS-visible memory. Hence the driver can continue + to use physical addresses for DMA. config DMAR_FLOPPY_WA def_bool y depends on DMAR - help - Floppy disk drivers are know to bypass DMA API calls - thereby failing to work when IOMMU is enabled. This - workaround will setup a 1:1 mapping for the first - 16M to make floppy (an ISA device) work. + ---help--- + Floppy disk drivers are know to bypass DMA API calls + thereby failing to work when IOMMU is enabled. This + workaround will setup a 1:1 mapping for the first + 16M to make floppy (an ISA device) work. config INTR_REMAP bool "Support for Interrupt Remapping (EXPERIMENTAL)" depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI && EXPERIMENTAL - help - Supports Interrupt remapping for IO-APIC and MSI devices. - To use x2apic mode in the CPU's which support x2APIC enhancements or - to support platforms with CPU's having > 8 bit APIC ID, say Y. + select X86_X2APIC + ---help--- + Supports Interrupt remapping for IO-APIC and MSI devices. + To use x2apic mode in the CPU's which support x2APIC enhancements or + to support platforms with CPU's having > 8 bit APIC ID, say Y. source "drivers/pci/pcie/Kconfig" @@ -1853,8 +1915,7 @@ if X86_32 config ISA bool "ISA support" - depends on !X86_VOYAGER - help + ---help--- Find out whether you have ISA slots on your motherboard. ISA is the name of a bus system, i.e. the way the CPU talks to the other stuff inside your box. Other bus systems are PCI, EISA, MicroChannel @@ -1880,9 +1941,8 @@ config EISA source "drivers/eisa/Kconfig" config MCA - bool "MCA support" if !X86_VOYAGER - default y if X86_VOYAGER - help + bool "MCA support" + ---help--- MicroChannel Architecture is found in some IBM PS/2 machines and laptops. It is a bus system similar to PCI or ISA. See (and especially the web page given @@ -1892,8 +1952,7 @@ source "drivers/mca/Kconfig" config SCx200 tristate "NatSemi SCx200 support" - depends on !X86_VOYAGER - help + ---help--- This provides basic support for National Semiconductor's (now AMD's) Geode processors. The driver probes for the PCI-IDs of several on-chip devices, so its a good dependency @@ -1905,7 +1964,7 @@ config SCx200HR_TIMER tristate "NatSemi SCx200 27MHz High-Resolution Timer Support" depends on SCx200 && GENERIC_TIME default y - help + ---help--- This driver provides a clocksource built upon the on-chip 27MHz high-resolution timer. Its also a workaround for NSC Geode SC-1100's buggy TSC, which loses time when the @@ -1916,7 +1975,7 @@ config GEODE_MFGPT_TIMER def_bool y prompt "Geode Multi-Function General Purpose Timer (MFGPT) events" depends on MGEODE_LX && GENERIC_TIME && GENERIC_CLOCKEVENTS - help + ---help--- This driver provides a clock event source based on the MFGPT timer(s) in the CS5535 and CS5536 companion chip for the geode. MFGPTs have a better resolution and max interval than the @@ -1925,7 +1984,7 @@ config GEODE_MFGPT_TIMER config OLPC bool "One Laptop Per Child support" default n - help + ---help--- Add support for detecting the unique features of the OLPC XO hardware. @@ -1950,16 +2009,16 @@ config IA32_EMULATION bool "IA32 Emulation" depends on X86_64 select COMPAT_BINFMT_ELF - help + ---help--- Include code to run 32-bit programs under a 64-bit kernel. You should likely turn this on, unless you're 100% sure that you don't have any 32-bit programs left. config IA32_AOUT - tristate "IA32 a.out support" - depends on IA32_EMULATION - help - Support old a.out binaries in the 32bit emulation. + tristate "IA32 a.out support" + depends on IA32_EMULATION + ---help--- + Support old a.out binaries in the 32bit emulation. config COMPAT def_bool y Index: linux-2.6-tip/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6-tip.orig/arch/x86/Kconfig.cpu +++ linux-2.6-tip/arch/x86/Kconfig.cpu @@ -50,7 +50,7 @@ config M386 config M486 bool "486" depends on X86_32 - help + ---help--- Select this for a 486 series processor, either Intel or one of the compatible processors from AMD, Cyrix, IBM, or Intel. Includes DX, DX2, and DX4 variants; also SL/SLC/SLC2/SLC3/SX/SX2 and UMC U5D or @@ -59,7 +59,7 @@ config M486 config M586 bool "586/K5/5x86/6x86/6x86MX" depends on X86_32 - help + ---help--- Select this for an 586 or 686 series processor such as the AMD K5, the Cyrix 5x86, 6x86 and 6x86MX. This choice does not assume the RDTSC (Read Time Stamp Counter) instruction. @@ -67,21 +67,21 @@ config M586 config M586TSC bool "Pentium-Classic" depends on X86_32 - help + ---help--- Select this for a Pentium Classic processor with the RDTSC (Read Time Stamp Counter) instruction for benchmarking. config M586MMX bool "Pentium-MMX" depends on X86_32 - help + ---help--- Select this for a Pentium with the MMX graphics/multimedia extended instructions. config M686 bool "Pentium-Pro" depends on X86_32 - help + ---help--- Select this for Intel Pentium Pro chips. This enables the use of Pentium Pro extended instructions, and disables the init-time guard against the f00f bug found in earlier Pentiums. @@ -89,7 +89,7 @@ config M686 config MPENTIUMII bool "Pentium-II/Celeron(pre-Coppermine)" depends on X86_32 - help + ---help--- Select this for Intel chips based on the Pentium-II and pre-Coppermine Celeron core. This option enables an unaligned copy optimization, compiles the kernel with optimization flags @@ -99,7 +99,7 @@ config MPENTIUMII config MPENTIUMIII bool "Pentium-III/Celeron(Coppermine)/Pentium-III Xeon" depends on X86_32 - help + ---help--- Select this for Intel chips based on the Pentium-III and Celeron-Coppermine core. This option enables use of some extended prefetch instructions in addition to the Pentium II @@ -108,14 +108,14 @@ config MPENTIUMIII config MPENTIUMM bool "Pentium M" depends on X86_32 - help + ---help--- Select this for Intel Pentium M (not Pentium-4 M) notebook chips. config MPENTIUM4 bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon" depends on X86_32 - help + ---help--- Select this for Intel Pentium 4 chips. This includes the Pentium 4, Pentium D, P4-based Celeron and Xeon, and Pentium-4 M (not Pentium M) chips. This option enables compile @@ -151,7 +151,7 @@ config MPENTIUM4 config MK6 bool "K6/K6-II/K6-III" depends on X86_32 - help + ---help--- Select this for an AMD K6-family processor. Enables use of some extended instructions, and passes appropriate optimization flags to GCC. @@ -159,14 +159,14 @@ config MK6 config MK7 bool "Athlon/Duron/K7" depends on X86_32 - help + ---help--- Select this for an AMD Athlon K7-family processor. Enables use of some extended instructions, and passes appropriate optimization flags to GCC. config MK8 bool "Opteron/Athlon64/Hammer/K8" - help + ---help--- Select this for an AMD Opteron or Athlon64 Hammer-family processor. Enables use of some extended instructions, and passes appropriate optimization flags to GCC. @@ -174,7 +174,7 @@ config MK8 config MCRUSOE bool "Crusoe" depends on X86_32 - help + ---help--- Select this for a Transmeta Crusoe processor. Treats the processor like a 586 with TSC, and sets some GCC optimization flags (like a Pentium Pro with no alignment requirements). @@ -182,13 +182,13 @@ config MCRUSOE config MEFFICEON bool "Efficeon" depends on X86_32 - help + ---help--- Select this for a Transmeta Efficeon processor. config MWINCHIPC6 bool "Winchip-C6" depends on X86_32 - help + ---help--- Select this for an IDT Winchip C6 chip. Linux and GCC treat this chip as a 586TSC with some extended instructions and alignment requirements. @@ -196,7 +196,7 @@ config MWINCHIPC6 config MWINCHIP3D bool "Winchip-2/Winchip-2A/Winchip-3" depends on X86_32 - help + ---help--- Select this for an IDT Winchip-2, 2A or 3. Linux and GCC treat this chip as a 586TSC with some extended instructions and alignment requirements. Also enable out of order memory @@ -206,19 +206,19 @@ config MWINCHIP3D config MGEODEGX1 bool "GeodeGX1" depends on X86_32 - help + ---help--- Select this for a Geode GX1 (Cyrix MediaGX) chip. config MGEODE_LX bool "Geode GX/LX" depends on X86_32 - help + ---help--- Select this for AMD Geode GX and LX processors. config MCYRIXIII bool "CyrixIII/VIA-C3" depends on X86_32 - help + ---help--- Select this for a Cyrix III or C3 chip. Presently Linux and GCC treat this chip as a generic 586. Whilst the CPU is 686 class, it lacks the cmov extension which gcc assumes is present when @@ -230,7 +230,7 @@ config MCYRIXIII config MVIAC3_2 bool "VIA C3-2 (Nehemiah)" depends on X86_32 - help + ---help--- Select this for a VIA C3 "Nehemiah". Selecting this enables usage of SSE and tells gcc to treat the CPU as a 686. Note, this kernel will not boot on older (pre model 9) C3s. @@ -238,14 +238,14 @@ config MVIAC3_2 config MVIAC7 bool "VIA C7" depends on X86_32 - help + ---help--- Select this for a VIA C7. Selecting this uses the correct cache shift and tells gcc to treat the CPU as a 686. config MPSC bool "Intel P4 / older Netburst based Xeon" depends on X86_64 - help + ---help--- Optimize for Intel Pentium 4, Pentium D and older Nocona/Dempsey Xeon CPUs with Intel 64bit which is compatible with x86-64. Note that the latest Xeons (Xeon 51xx and 53xx) are not based on the @@ -255,7 +255,7 @@ config MPSC config MCORE2 bool "Core 2/newer Xeon" - help + ---help--- Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and 53xx) CPUs. You can distinguish newer from older Xeons by the CPU @@ -265,7 +265,7 @@ config MCORE2 config GENERIC_CPU bool "Generic-x86-64" depends on X86_64 - help + ---help--- Generic x86-64 CPU. Run equally well on all x86-64 CPUs. @@ -274,7 +274,7 @@ endchoice config X86_GENERIC bool "Generic x86 support" depends on X86_32 - help + ---help--- Instead of just including optimizations for the selected x86 variant (e.g. PII, Crusoe or Athlon), include some more generic optimizations as well. This will make the kernel @@ -294,25 +294,23 @@ config X86_CPU # Define implied options from the CPU selection here config X86_L1_CACHE_BYTES int - default "128" if GENERIC_CPU || MPSC - default "64" if MK8 || MCORE2 - depends on X86_64 + default "128" if MPSC + default "64" if GENERIC_CPU || MK8 || MCORE2 || X86_32 config X86_INTERNODE_CACHE_BYTES int default "4096" if X86_VSMP default X86_L1_CACHE_BYTES if !X86_VSMP - depends on X86_64 config X86_CMPXCHG def_bool X86_64 || (X86_32 && !M386) config X86_L1_CACHE_SHIFT int - default "7" if MPENTIUM4 || X86_GENERIC || GENERIC_CPU || MPSC + default "7" if MPENTIUM4 || MPSC default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU config X86_XADD def_bool y @@ -321,7 +319,7 @@ config X86_XADD config X86_PPRO_FENCE bool "PentiumPro memory ordering errata workaround" depends on M686 || M586MMX || M586TSC || M586 || M486 || M386 || MGEODEGX1 - help + ---help--- Old PentiumPro multiprocessor systems had errata that could cause memory operations to violate the x86 ordering standard in rare cases. Enabling this option will attempt to work around some (but not all) @@ -414,14 +412,14 @@ config X86_DEBUGCTLMSR menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EMBEDDED - help + ---help--- This lets you choose what x86 vendor support code your kernel will include. config CPU_SUP_INTEL default y bool "Support Intel processors" if PROCESSOR_SELECT - help + ---help--- This enables detection, tunings and quirks for Intel processors You need this enabled if you want your kernel to run on an @@ -435,7 +433,7 @@ config CPU_SUP_CYRIX_32 default y bool "Support Cyrix processors" if PROCESSOR_SELECT depends on !64BIT - help + ---help--- This enables detection, tunings and quirks for Cyrix processors You need this enabled if you want your kernel to run on a @@ -448,7 +446,7 @@ config CPU_SUP_CYRIX_32 config CPU_SUP_AMD default y bool "Support AMD processors" if PROCESSOR_SELECT - help + ---help--- This enables detection, tunings and quirks for AMD processors You need this enabled if you want your kernel to run on an @@ -458,25 +456,10 @@ config CPU_SUP_AMD If unsure, say N. -config CPU_SUP_CENTAUR_32 - default y - bool "Support Centaur processors" if PROCESSOR_SELECT - depends on !64BIT - help - This enables detection, tunings and quirks for Centaur processors - - You need this enabled if you want your kernel to run on a - Centaur CPU. Disabling this option on other types of CPUs - makes the kernel a tiny bit smaller. Disabling it on a Centaur - CPU might render the kernel unbootable. - - If unsure, say N. - -config CPU_SUP_CENTAUR_64 +config CPU_SUP_CENTAUR default y bool "Support Centaur processors" if PROCESSOR_SELECT - depends on 64BIT - help + ---help--- This enables detection, tunings and quirks for Centaur processors You need this enabled if you want your kernel to run on a @@ -490,7 +473,7 @@ config CPU_SUP_TRANSMETA_32 default y bool "Support Transmeta processors" if PROCESSOR_SELECT depends on !64BIT - help + ---help--- This enables detection, tunings and quirks for Transmeta processors You need this enabled if you want your kernel to run on a @@ -504,7 +487,7 @@ config CPU_SUP_UMC_32 default y bool "Support UMC processors" if PROCESSOR_SELECT depends on !64BIT - help + ---help--- This enables detection, tunings and quirks for UMC processors You need this enabled if you want your kernel to run on a @@ -523,8 +506,7 @@ config X86_PTRACE_BTS bool "Branch Trace Store" default y depends on X86_DEBUGCTLMSR - depends on BROKEN - help + ---help--- This adds a ptrace interface to the hardware's branch trace store. Debuggers may use it to collect an execution trace of the debugged Index: linux-2.6-tip/arch/x86/Kconfig.debug =================================================================== --- linux-2.6-tip.orig/arch/x86/Kconfig.debug +++ linux-2.6-tip/arch/x86/Kconfig.debug @@ -7,7 +7,7 @@ source "lib/Kconfig.debug" config STRICT_DEVMEM bool "Filter access to /dev/mem" - help + ---help--- If this option is disabled, you allow userspace (root) access to all of memory, including kernel and userspace memory. Accidental access to this is obviously disastrous, but specific access can @@ -25,7 +25,7 @@ config STRICT_DEVMEM config X86_VERBOSE_BOOTUP bool "Enable verbose x86 bootup info messages" default y - help + ---help--- Enables the informational output from the decompression stage (e.g. bzImage) of the boot. If you disable this you will still see errors. Disable this if you want silent bootup. @@ -33,7 +33,7 @@ config X86_VERBOSE_BOOTUP config EARLY_PRINTK bool "Early printk" if EMBEDDED default y - help + ---help--- Write kernel log output directly into the VGA buffer or to a serial port. @@ -47,7 +47,7 @@ config EARLY_PRINTK_DBGP bool "Early printk via EHCI debug port" default n depends on EARLY_PRINTK && PCI - help + ---help--- Write kernel log output directly into the EHCI debug port. This is useful for kernel debugging when your machine crashes very @@ -59,14 +59,14 @@ config EARLY_PRINTK_DBGP config DEBUG_STACKOVERFLOW bool "Check for stack overflows" depends on DEBUG_KERNEL - help + ---help--- This option will cause messages to be printed if free stack space drops below a certain limit. config DEBUG_STACK_USAGE bool "Stack utilization instrumentation" depends on DEBUG_KERNEL - help + ---help--- Enables the display of the minimum amount of free stack which each task has ever had available in the sysrq-T and sysrq-P debug output. @@ -75,7 +75,8 @@ config DEBUG_STACK_USAGE config DEBUG_PAGEALLOC bool "Debug page memory allocations" depends on DEBUG_KERNEL - help + depends on !KMEMCHECK + ---help--- Unmap pages from the kernel linear mapping after free_pages(). This results in a large slowdown, but helps to find certain types of memory corruptions. @@ -83,9 +84,10 @@ config DEBUG_PAGEALLOC config DEBUG_PER_CPU_MAPS bool "Debug access to per_cpu maps" depends on DEBUG_KERNEL - depends on X86_SMP + depends on SMP + depends on !PREEMPT_RT default n - help + ---help--- Say Y to verify that the per_cpu map being accessed has been setup. Adds a fair amount of code to kernel memory and decreases performance. @@ -96,7 +98,7 @@ config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS - help + ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers who are working in architecture specific areas of the kernel. @@ -108,7 +110,7 @@ config DEBUG_RODATA bool "Write protect kernel read-only data structures" default y depends on DEBUG_KERNEL - help + ---help--- Mark the kernel read-only data as write-protected in the pagetables, in order to catch accidental (and incorrect) writes to such const data. This is recommended so that we can catch kernel bugs sooner. @@ -117,7 +119,8 @@ config DEBUG_RODATA config DEBUG_RODATA_TEST bool "Testcase for the DEBUG_RODATA feature" depends on DEBUG_RODATA - help + default y + ---help--- This option enables a testcase for the DEBUG_RODATA feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" @@ -125,7 +128,7 @@ config DEBUG_RODATA_TEST config DEBUG_NX_TEST tristate "Testcase for the NX non-executable stack feature" depends on DEBUG_KERNEL && m - help + ---help--- This option enables a testcase for the CPU NX capability and the software setup of this feature. If in doubt, say "N" @@ -133,7 +136,8 @@ config DEBUG_NX_TEST config 4KSTACKS bool "Use 4Kb for kernel stacks instead of 8Kb" depends on X86_32 - help + default y + ---help--- If you say Y here the kernel will use a 4Kb stacksize for the kernel stack attached to each process/thread. This facilitates running more threads on a system and also reduces the pressure @@ -144,7 +148,7 @@ config DOUBLEFAULT default y bool "Enable doublefault exception handler" if EMBEDDED depends on X86_32 - help + ---help--- This option allows trapping of rare doublefault exceptions that would otherwise cause a system to silently reboot. Disabling this option saves about 4k and might cause you much additional grey @@ -154,7 +158,7 @@ config IOMMU_DEBUG bool "Enable IOMMU debugging" depends on GART_IOMMU && DEBUG_KERNEL depends on X86_64 - help + ---help--- Force the IOMMU to on even when you have less than 4GB of memory and add debugging code. On overflow always panic. And allow to enable IOMMU leak tracing. Can be disabled at boot @@ -170,7 +174,7 @@ config IOMMU_LEAK bool "IOMMU leak tracing" depends on DEBUG_KERNEL depends on IOMMU_DEBUG - help + ---help--- Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. @@ -203,25 +207,25 @@ choice config IO_DELAY_0X80 bool "port 0x80 based port-IO delay [recommended]" - help + ---help--- This is the traditional Linux IO delay used for in/out_p. It is the most tested hence safest selection here. config IO_DELAY_0XED bool "port 0xed based port-IO delay" - help + ---help--- Use port 0xed as the IO delay. This frees up port 0x80 which is often used as a hardware-debug port. config IO_DELAY_UDELAY bool "udelay based port-IO delay" - help + ---help--- Use udelay(2) as the IO delay method. This provides the delay while not having any side-effect on the IO port space. config IO_DELAY_NONE bool "no port-IO delay" - help + ---help--- No port-IO delay. Will break on old boxes that require port-IO delay for certain operations. Should work on most new machines. @@ -255,18 +259,18 @@ config DEBUG_BOOT_PARAMS bool "Debug boot parameters" depends on DEBUG_KERNEL depends on DEBUG_FS - help + ---help--- This option will cause struct boot_params to be exported via debugfs. config CPA_DEBUG bool "CPA self-test code" depends on DEBUG_KERNEL - help + ---help--- Do change_page_attr() self-tests every 30 seconds. config OPTIMIZE_INLINING bool "Allow gcc to uninline functions marked 'inline'" - help + ---help--- This option determines if the kernel forces gcc to inline the functions developers have marked 'inline'. Doing so takes away freedom from gcc to do what it thinks is best, which is desirable for the gcc 3.x series of @@ -279,4 +283,3 @@ config OPTIMIZE_INLINING If unsure, say N. endmenu - Index: linux-2.6-tip/arch/x86/Makefile =================================================================== --- linux-2.6-tip.orig/arch/x86/Makefile +++ linux-2.6-tip/arch/x86/Makefile @@ -70,14 +70,22 @@ else # this works around some issues with generating unwind tables in older gccs # newer gccs do it by default KBUILD_CFLAGS += -maccumulate-outgoing-args +endif - stackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh - stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \ - "$(CC)" -fstack-protector ) - stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(stackp) \ - "$(CC)" -fstack-protector-all ) +ifdef CONFIG_CC_STACKPROTECTOR + cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh + ifeq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC)),y) + stackp-y := -fstack-protector + stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += -fstack-protector-all + KBUILD_CFLAGS += $(stackp-y) + else + $(warning stack protector enabled but no compiler support) + endif +endif - KBUILD_CFLAGS += $(stackp-y) +# Don't unroll struct assignments with kmemcheck enabled +ifeq ($(CONFIG_KMEMCHECK),y) + KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy) endif # Stackpointer is addressed different for 32 bit and 64 bit x86 @@ -102,29 +110,6 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin # prevent gcc from generating any FP code by mistake KBUILD_CFLAGS += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,) -### -# Sub architecture support -# fcore-y is linked before mcore-y files. - -# Default subarch .c files -mcore-y := arch/x86/mach-default/ - -# Voyager subarch support -mflags-$(CONFIG_X86_VOYAGER) := -Iarch/x86/include/asm/mach-voyager -mcore-$(CONFIG_X86_VOYAGER) := arch/x86/mach-voyager/ - -# generic subarchitecture -mflags-$(CONFIG_X86_GENERICARCH):= -Iarch/x86/include/asm/mach-generic -fcore-$(CONFIG_X86_GENERICARCH) += arch/x86/mach-generic/ -mcore-$(CONFIG_X86_GENERICARCH) := arch/x86/mach-default/ - -# default subarch .h files -mflags-y += -Iarch/x86/include/asm/mach-default - -# 64 bit does not support subarch support - clear sub arch variables -fcore-$(CONFIG_X86_64) := -mcore-$(CONFIG_X86_64) := - KBUILD_CFLAGS += $(mflags-y) KBUILD_AFLAGS += $(mflags-y) @@ -150,9 +135,6 @@ core-$(CONFIG_LGUEST_GUEST) += arch/x86/ core-y += arch/x86/kernel/ core-y += arch/x86/mm/ -# Remaining sub architecture files -core-y += $(mcore-y) - core-y += arch/x86/crypto/ core-y += arch/x86/vdso/ core-$(CONFIG_IA32_EMULATION) += arch/x86/ia32/ @@ -176,34 +158,23 @@ endif boot := arch/x86/boot -PHONY += zImage bzImage compressed zlilo bzlilo \ - zdisk bzdisk fdimage fdimage144 fdimage288 isoimage install +BOOT_TARGETS = bzlilo bzdisk fdimage fdimage144 fdimage288 isoimage install + +PHONY += bzImage $(BOOT_TARGETS) # Default kernel to build all: bzImage # KBUILD_IMAGE specify target image being built - KBUILD_IMAGE := $(boot)/bzImage -zImage zlilo zdisk: KBUILD_IMAGE := $(boot)/zImage +KBUILD_IMAGE := $(boot)/bzImage -zImage bzImage: vmlinux +bzImage: vmlinux $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ -compressed: zImage - -zlilo bzlilo: vmlinux - $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(KBUILD_IMAGE) zlilo - -zdisk bzdisk: vmlinux - $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(KBUILD_IMAGE) zdisk - -fdimage fdimage144 fdimage288 isoimage: vmlinux - $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(KBUILD_IMAGE) $@ - -install: - $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(KBUILD_IMAGE) install +$(BOOT_TARGETS): vmlinux + $(Q)$(MAKE) $(build)=$(boot) $@ PHONY += vdso_install vdso_install: @@ -228,7 +199,3 @@ define archhelp echo ' FDARGS="..." arguments for the booted kernel' echo ' FDINITRD=file initrd for the booted kernel' endef - -CLEAN_FILES += arch/x86/boot/fdimage \ - arch/x86/boot/image.iso \ - arch/x86/boot/mtools.conf Index: linux-2.6-tip/arch/x86/boot/Makefile =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/Makefile +++ linux-2.6-tip/arch/x86/boot/Makefile @@ -6,33 +6,30 @@ # for more details. # # Copyright (C) 1994 by Linus Torvalds +# Changed by many, many contributors over the years. # # ROOT_DEV specifies the default root-device when making the image. # This can be either FLOPPY, CURRENT, /dev/xxxx or empty, in which case # the default of FLOPPY is used by 'build'. -ROOT_DEV := CURRENT +ROOT_DEV := CURRENT # If you want to preset the SVGA mode, uncomment the next line and # set SVGA_MODE to whatever number you want. # Set it to -DSVGA_MODE=NORMAL_VGA if you just want the EGA/VGA mode. # The number is the same as you would ordinarily press at bootup. -SVGA_MODE := -DSVGA_MODE=NORMAL_VGA +SVGA_MODE := -DSVGA_MODE=NORMAL_VGA -# If you want the RAM disk device, define this to be the size in blocks. - -#RAMDISK := -DRAMDISK=512 - -targets := vmlinux.bin setup.bin setup.elf zImage bzImage +targets := vmlinux.bin setup.bin setup.elf bzImage +targets += fdimage fdimage144 fdimage288 image.iso mtools.conf subdir- := compressed setup-y += a20.o cmdline.o copy.o cpu.o cpucheck.o edd.o setup-y += header.o main.o mca.o memory.o pm.o pmjump.o setup-y += printf.o string.o tty.o video.o video-mode.o version.o setup-$(CONFIG_X86_APM_BOOT) += apm.o -setup-$(CONFIG_X86_VOYAGER) += voyager.o # The link order of the video-*.o modules can matter. In particular, # video-vga.o *must* be listed first, followed by video-vesa.o. @@ -72,17 +69,13 @@ KBUILD_CFLAGS := $(LINUXINCLUDE) -g -Os KBUILD_CFLAGS += $(call cc-option,-m32) KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ -$(obj)/zImage: asflags-y := $(SVGA_MODE) $(RAMDISK) -$(obj)/bzImage: ccflags-y := -D__BIG_KERNEL__ -$(obj)/bzImage: asflags-y := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__ -$(obj)/bzImage: BUILDFLAGS := -b +$(obj)/bzImage: asflags-y := $(SVGA_MODE) quiet_cmd_image = BUILD $@ -cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/setup.bin \ - $(obj)/vmlinux.bin $(ROOT_DEV) > $@ +cmd_image = $(obj)/tools/build $(obj)/setup.bin $(obj)/vmlinux.bin \ + $(ROOT_DEV) > $@ -$(obj)/zImage $(obj)/bzImage: $(obj)/setup.bin \ - $(obj)/vmlinux.bin $(obj)/tools/build FORCE +$(obj)/bzImage: $(obj)/setup.bin $(obj)/vmlinux.bin $(obj)/tools/build FORCE $(call if_changed,image) @echo 'Kernel: $@ is ready' ' (#'`cat .version`')' @@ -117,9 +110,11 @@ $(obj)/setup.bin: $(obj)/setup.elf FORCE $(obj)/compressed/vmlinux: FORCE $(Q)$(MAKE) $(build)=$(obj)/compressed $@ -# Set this if you want to pass append arguments to the zdisk/fdimage/isoimage kernel +# Set this if you want to pass append arguments to the +# bzdisk/fdimage/isoimage kernel FDARGS = -# Set this if you want an initrd included with the zdisk/fdimage/isoimage kernel +# Set this if you want an initrd included with the +# bzdisk/fdimage/isoimage kernel FDINITRD = image_cmdline = default linux $(FDARGS) $(if $(FDINITRD),initrd=initrd.img,) @@ -128,7 +123,7 @@ $(obj)/mtools.conf: $(src)/mtools.conf.i sed -e 's|@OBJ@|$(obj)|g' < $< > $@ # This requires write access to /dev/fd0 -zdisk: $(BOOTIMAGE) $(obj)/mtools.conf +bzdisk: $(obj)/bzImage $(obj)/mtools.conf MTOOLSRC=$(obj)/mtools.conf mformat a: ; sync syslinux /dev/fd0 ; sync echo '$(image_cmdline)' | \ @@ -136,10 +131,10 @@ zdisk: $(BOOTIMAGE) $(obj)/mtools.conf if [ -f '$(FDINITRD)' ] ; then \ MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' a:initrd.img ; \ fi - MTOOLSRC=$(obj)/mtools.conf mcopy $(BOOTIMAGE) a:linux ; sync + MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage a:linux ; sync # These require being root or having syslinux 2.02 or higher installed -fdimage fdimage144: $(BOOTIMAGE) $(obj)/mtools.conf +fdimage fdimage144: $(obj)/bzImage $(obj)/mtools.conf dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=1440 MTOOLSRC=$(obj)/mtools.conf mformat v: ; sync syslinux $(obj)/fdimage ; sync @@ -148,9 +143,9 @@ fdimage fdimage144: $(BOOTIMAGE) $(obj)/ if [ -f '$(FDINITRD)' ] ; then \ MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' v:initrd.img ; \ fi - MTOOLSRC=$(obj)/mtools.conf mcopy $(BOOTIMAGE) v:linux ; sync + MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage v:linux ; sync -fdimage288: $(BOOTIMAGE) $(obj)/mtools.conf +fdimage288: $(obj)/bzImage $(obj)/mtools.conf dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=2880 MTOOLSRC=$(obj)/mtools.conf mformat w: ; sync syslinux $(obj)/fdimage ; sync @@ -159,9 +154,9 @@ fdimage288: $(BOOTIMAGE) $(obj)/mtools.c if [ -f '$(FDINITRD)' ] ; then \ MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' w:initrd.img ; \ fi - MTOOLSRC=$(obj)/mtools.conf mcopy $(BOOTIMAGE) w:linux ; sync + MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage w:linux ; sync -isoimage: $(BOOTIMAGE) +isoimage: $(obj)/bzImage -rm -rf $(obj)/isoimage mkdir $(obj)/isoimage for i in lib lib64 share end ; do \ @@ -171,7 +166,7 @@ isoimage: $(BOOTIMAGE) fi ; \ if [ $$i = end ] ; then exit 1 ; fi ; \ done - cp $(BOOTIMAGE) $(obj)/isoimage/linux + cp $(obj)/bzImage $(obj)/isoimage/linux echo '$(image_cmdline)' > $(obj)/isoimage/isolinux.cfg if [ -f '$(FDINITRD)' ] ; then \ cp '$(FDINITRD)' $(obj)/isoimage/initrd.img ; \ @@ -182,12 +177,13 @@ isoimage: $(BOOTIMAGE) isohybrid $(obj)/image.iso 2>/dev/null || true rm -rf $(obj)/isoimage -zlilo: $(BOOTIMAGE) +bzlilo: $(obj)/bzImage if [ -f $(INSTALL_PATH)/vmlinuz ]; then mv $(INSTALL_PATH)/vmlinuz $(INSTALL_PATH)/vmlinuz.old; fi if [ -f $(INSTALL_PATH)/System.map ]; then mv $(INSTALL_PATH)/System.map $(INSTALL_PATH)/System.old; fi - cat $(BOOTIMAGE) > $(INSTALL_PATH)/vmlinuz + cat $(obj)/bzImage > $(INSTALL_PATH)/vmlinuz cp System.map $(INSTALL_PATH)/ if [ -x /sbin/lilo ]; then /sbin/lilo; else /etc/lilo/install; fi install: - sh $(srctree)/$(src)/install.sh $(KERNELRELEASE) $(BOOTIMAGE) System.map "$(INSTALL_PATH)" + sh $(srctree)/$(src)/install.sh $(KERNELRELEASE) $(obj)/bzImage \ + System.map "$(INSTALL_PATH)" Index: linux-2.6-tip/arch/x86/boot/a20.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/a20.c +++ linux-2.6-tip/arch/x86/boot/a20.c @@ -2,6 +2,7 @@ * * Copyright (C) 1991, 1992 Linus Torvalds * Copyright 2007-2008 rPath, Inc. - All Rights Reserved + * Copyright 2009 Intel Corporation * * This file is part of the Linux kernel, and is made available under * the terms of the GNU General Public License version 2. @@ -15,16 +16,23 @@ #include "boot.h" #define MAX_8042_LOOPS 100000 +#define MAX_8042_FF 32 static int empty_8042(void) { u8 status; int loops = MAX_8042_LOOPS; + int ffs = MAX_8042_FF; while (loops--) { io_delay(); status = inb(0x64); + if (status == 0xff) { + /* FF is a plausible, but very unlikely status */ + if (!--ffs) + return -1; /* Assume no KBC present */ + } if (status & 1) { /* Read and discard input data */ io_delay(); @@ -118,44 +126,37 @@ static void enable_a20_fast(void) int enable_a20(void) { -#if defined(CONFIG_X86_ELAN) - /* Elan croaks if we try to touch the KBC */ - enable_a20_fast(); - while (!a20_test_long()) - ; - return 0; -#elif defined(CONFIG_X86_VOYAGER) - /* On Voyager, a20_test() is unsafe? */ - enable_a20_kbc(); - return 0; -#else int loops = A20_ENABLE_LOOPS; - while (loops--) { - /* First, check to see if A20 is already enabled - (legacy free, etc.) */ - if (a20_test_short()) - return 0; - - /* Next, try the BIOS (INT 0x15, AX=0x2401) */ - enable_a20_bios(); - if (a20_test_short()) - return 0; - - /* Try enabling A20 through the keyboard controller */ - empty_8042(); - if (a20_test_short()) - return 0; /* BIOS worked, but with delayed reaction */ + int kbc_err; - enable_a20_kbc(); - if (a20_test_long()) - return 0; - - /* Finally, try enabling the "fast A20 gate" */ - enable_a20_fast(); - if (a20_test_long()) - return 0; - } - - return -1; -#endif + while (loops--) { + /* First, check to see if A20 is already enabled + (legacy free, etc.) */ + if (a20_test_short()) + return 0; + + /* Next, try the BIOS (INT 0x15, AX=0x2401) */ + enable_a20_bios(); + if (a20_test_short()) + return 0; + + /* Try enabling A20 through the keyboard controller */ + kbc_err = empty_8042(); + + if (a20_test_short()) + return 0; /* BIOS worked, but with delayed reaction */ + + if (!kbc_err) { + enable_a20_kbc(); + if (a20_test_long()) + return 0; + } + + /* Finally, try enabling the "fast A20 gate" */ + enable_a20_fast(); + if (a20_test_long()) + return 0; + } + + return -1; } Index: linux-2.6-tip/arch/x86/boot/boot.h =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/boot.h +++ linux-2.6-tip/arch/x86/boot/boot.h @@ -302,9 +302,6 @@ void probe_cards(int unsafe); /* video-vesa.c */ void vesa_store_edid(void); -/* voyager.c */ -int query_voyager(void); - #endif /* __ASSEMBLY__ */ #endif /* BOOT_BOOT_H */ Index: linux-2.6-tip/arch/x86/boot/compressed/Makefile =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/compressed/Makefile +++ linux-2.6-tip/arch/x86/boot/compressed/Makefile @@ -4,7 +4,7 @@ # create a compressed vmlinux image from the original vmlinux # -targets := vmlinux vmlinux.bin vmlinux.bin.gz head_$(BITS).o misc.o piggy.o +targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma head_$(BITS).o misc.o piggy.o KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2 KBUILD_CFLAGS += -fno-strict-aliasing -fPIC @@ -47,18 +47,35 @@ ifeq ($(CONFIG_X86_32),y) ifdef CONFIG_RELOCATABLE $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin.all FORCE $(call if_changed,gzip) +$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin.all FORCE + $(call if_changed,bzip2) +$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin.all FORCE + $(call if_changed,lzma) else $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE $(call if_changed,gzip) +$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE + $(call if_changed,bzip2) +$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE + $(call if_changed,lzma) endif LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T else + $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE $(call if_changed,gzip) +$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE + $(call if_changed,bzip2) +$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE + $(call if_changed,lzma) LDFLAGS_piggy.o := -r --format binary --oformat elf64-x86-64 -T endif -$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE +suffix_$(CONFIG_KERNEL_GZIP) = gz +suffix_$(CONFIG_KERNEL_BZIP2) = bz2 +suffix_$(CONFIG_KERNEL_LZMA) = lzma + +$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix_y) FORCE $(call if_changed,ld) Index: linux-2.6-tip/arch/x86/boot/compressed/head_32.S =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/compressed/head_32.S +++ linux-2.6-tip/arch/x86/boot/compressed/head_32.S @@ -25,14 +25,12 @@ #include #include -#include +#include #include #include .section ".text.head","ax",@progbits - .globl startup_32 - -startup_32: +ENTRY(startup_32) cld /* test KEEP_SEGMENTS flag to see if the bootloader is asking * us to not reload segments */ @@ -113,6 +111,8 @@ startup_32: */ leal relocated(%ebx), %eax jmp *%eax +ENDPROC(startup_32) + .section ".text" relocated: Index: linux-2.6-tip/arch/x86/boot/compressed/head_64.S =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/compressed/head_64.S +++ linux-2.6-tip/arch/x86/boot/compressed/head_64.S @@ -26,8 +26,8 @@ #include #include -#include -#include +#include +#include #include #include #include @@ -35,9 +35,7 @@ .section ".text.head" .code32 - .globl startup_32 - -startup_32: +ENTRY(startup_32) cld /* test KEEP_SEGMENTS flag to see if the bootloader is asking * us to not reload segments */ @@ -176,6 +174,7 @@ startup_32: /* Jump from 32bit compatibility mode into 64bit mode. */ lret +ENDPROC(startup_32) no_longmode: /* This isn't an x86-64 CPU so hang */ @@ -295,7 +294,6 @@ relocated: call decompress_kernel popq %rsi - /* * Jump to the decompressed kernel. */ Index: linux-2.6-tip/arch/x86/boot/compressed/misc.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/compressed/misc.c +++ linux-2.6-tip/arch/x86/boot/compressed/misc.c @@ -116,71 +116,13 @@ /* * gzip declarations */ - -#define OF(args) args #define STATIC static #undef memset #undef memcpy #define memzero(s, n) memset((s), 0, (n)) -typedef unsigned char uch; -typedef unsigned short ush; -typedef unsigned long ulg; - -/* - * Window size must be at least 32k, and a power of two. - * We don't actually have a window just a huge output buffer, - * so we report a 2G window size, as that should always be - * larger than our output buffer: - */ -#define WSIZE 0x80000000 - -/* Input buffer: */ -static unsigned char *inbuf; - -/* Sliding window buffer (and final output buffer): */ -static unsigned char *window; -/* Valid bytes in inbuf: */ -static unsigned insize; - -/* Index of next byte to be processed in inbuf: */ -static unsigned inptr; - -/* Bytes in output buffer: */ -static unsigned outcnt; - -/* gzip flag byte */ -#define ASCII_FLAG 0x01 /* bit 0 set: file probably ASCII text */ -#define CONTINUATION 0x02 /* bit 1 set: continuation of multi-part gz file */ -#define EXTRA_FIELD 0x04 /* bit 2 set: extra field present */ -#define ORIG_NAM 0x08 /* bit 3 set: original file name present */ -#define COMMENT 0x10 /* bit 4 set: file comment present */ -#define ENCRYPTED 0x20 /* bit 5 set: file is encrypted */ -#define RESERVED 0xC0 /* bit 6, 7: reserved */ - -#define get_byte() (inptr < insize ? inbuf[inptr++] : fill_inbuf()) - -/* Diagnostic functions */ -#ifdef DEBUG -# define Assert(cond, msg) do { if (!(cond)) error(msg); } while (0) -# define Trace(x) do { fprintf x; } while (0) -# define Tracev(x) do { if (verbose) fprintf x ; } while (0) -# define Tracevv(x) do { if (verbose > 1) fprintf x ; } while (0) -# define Tracec(c, x) do { if (verbose && (c)) fprintf x ; } while (0) -# define Tracecv(c, x) do { if (verbose > 1 && (c)) fprintf x ; } while (0) -#else -# define Assert(cond, msg) -# define Trace(x) -# define Tracev(x) -# define Tracevv(x) -# define Tracec(c, x) -# define Tracecv(c, x) -#endif - -static int fill_inbuf(void); -static void flush_window(void); static void error(char *m); /* @@ -189,13 +131,8 @@ static void error(char *m); static struct boot_params *real_mode; /* Pointer to real-mode data */ static int quiet; -extern unsigned char input_data[]; -extern int input_len; - -static long bytes_out; - static void *memset(void *s, int c, unsigned n); -static void *memcpy(void *dest, const void *src, unsigned n); +void *memcpy(void *dest, const void *src, unsigned n); static void __putstr(int, const char *); #define putstr(__x) __putstr(0, __x) @@ -213,7 +150,17 @@ static char *vidmem; static int vidport; static int lines, cols; -#include "../../../../lib/inflate.c" +#ifdef CONFIG_KERNEL_GZIP +#include "../../../../lib/decompress_inflate.c" +#endif + +#ifdef CONFIG_KERNEL_BZIP2 +#include "../../../../lib/decompress_bunzip2.c" +#endif + +#ifdef CONFIG_KERNEL_LZMA +#include "../../../../lib/decompress_unlzma.c" +#endif static void scroll(void) { @@ -282,7 +229,7 @@ static void *memset(void *s, int c, unsi return s; } -static void *memcpy(void *dest, const void *src, unsigned n) +void *memcpy(void *dest, const void *src, unsigned n) { int i; const char *s = src; @@ -293,38 +240,6 @@ static void *memcpy(void *dest, const vo return dest; } -/* =========================================================================== - * Fill the input buffer. This is called only when the buffer is empty - * and at least one byte is really needed. - */ -static int fill_inbuf(void) -{ - error("ran out of input data"); - return 0; -} - -/* =========================================================================== - * Write the output window window[0..outcnt-1] and update crc and bytes_out. - * (Used for the decompressed data only.) - */ -static void flush_window(void) -{ - /* With my window equal to my output buffer - * I only need to compute the crc here. - */ - unsigned long c = crc; /* temporary variable */ - unsigned n; - unsigned char *in, ch; - - in = window; - for (n = 0; n < outcnt; n++) { - ch = *in++; - c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8); - } - crc = c; - bytes_out += (unsigned long)outcnt; - outcnt = 0; -} static void error(char *x) { @@ -407,12 +322,8 @@ asmlinkage void decompress_kernel(void * lines = real_mode->screen_info.orig_video_lines; cols = real_mode->screen_info.orig_video_cols; - window = output; /* Output buffer (Normally at 1M) */ free_mem_ptr = heap; /* Heap */ free_mem_end_ptr = heap + BOOT_HEAP_SIZE; - inbuf = input_data; /* Input buffer */ - insize = input_len; - inptr = 0; #ifdef CONFIG_X86_64 if ((unsigned long)output & (__KERNEL_ALIGN - 1)) @@ -430,10 +341,9 @@ asmlinkage void decompress_kernel(void * #endif #endif - makecrc(); if (!quiet) putstr("\nDecompressing Linux... "); - gunzip(); + decompress(input_data, input_len, NULL, NULL, output, NULL, error); parse_elf(output); if (!quiet) putstr("done.\nBooting the kernel.\n"); Index: linux-2.6-tip/arch/x86/boot/copy.S =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/copy.S +++ linux-2.6-tip/arch/x86/boot/copy.S @@ -8,6 +8,8 @@ * * ----------------------------------------------------------------------- */ +#include + /* * Memory copy routines */ @@ -15,9 +17,7 @@ .code16gcc .text - .globl memcpy - .type memcpy, @function -memcpy: +GLOBAL(memcpy) pushw %si pushw %di movw %ax, %di @@ -31,11 +31,9 @@ memcpy: popw %di popw %si ret - .size memcpy, .-memcpy +ENDPROC(memcpy) - .globl memset - .type memset, @function -memset: +GLOBAL(memset) pushw %di movw %ax, %di movzbl %dl, %eax @@ -48,52 +46,42 @@ memset: rep; stosb popw %di ret - .size memset, .-memset +ENDPROC(memset) - .globl copy_from_fs - .type copy_from_fs, @function -copy_from_fs: +GLOBAL(copy_from_fs) pushw %ds pushw %fs popw %ds call memcpy popw %ds ret - .size copy_from_fs, .-copy_from_fs +ENDPROC(copy_from_fs) - .globl copy_to_fs - .type copy_to_fs, @function -copy_to_fs: +GLOBAL(copy_to_fs) pushw %es pushw %fs popw %es call memcpy popw %es ret - .size copy_to_fs, .-copy_to_fs +ENDPROC(copy_to_fs) #if 0 /* Not currently used, but can be enabled as needed */ - - .globl copy_from_gs - .type copy_from_gs, @function -copy_from_gs: +GLOBAL(copy_from_gs) pushw %ds pushw %gs popw %ds call memcpy popw %ds ret - .size copy_from_gs, .-copy_from_gs - .globl copy_to_gs +ENDPROC(copy_from_gs) - .type copy_to_gs, @function -copy_to_gs: +GLOBAL(copy_to_gs) pushw %es pushw %gs popw %es call memcpy popw %es ret - .size copy_to_gs, .-copy_to_gs - +ENDPROC(copy_to_gs) #endif Index: linux-2.6-tip/arch/x86/boot/header.S =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/header.S +++ linux-2.6-tip/arch/x86/boot/header.S @@ -19,17 +19,13 @@ #include #include #include -#include +#include #include #include "boot.h" #include "offsets.h" -SETUPSECTS = 4 /* default nr of setup-sectors */ BOOTSEG = 0x07C0 /* original address of boot-sector */ -SYSSEG = DEF_SYSSEG /* system loaded at 0x10000 (65536) */ -SYSSIZE = DEF_SYSSIZE /* system size: # of 16-byte clicks */ - /* to be loaded */ -ROOT_DEV = 0 /* ROOT_DEV is now written by "build" */ +SYSSEG = 0x1000 /* historical load address >> 4 */ #ifndef SVGA_MODE #define SVGA_MODE ASK_VGA @@ -97,12 +93,12 @@ bugger_off_msg: .section ".header", "a" .globl hdr hdr: -setup_sects: .byte SETUPSECTS +setup_sects: .byte 0 /* Filled in by build.c */ root_flags: .word ROOT_RDONLY -syssize: .long SYSSIZE -ram_size: .word RAMDISK +syssize: .long 0 /* Filled in by build.c */ +ram_size: .word 0 /* Obsolete */ vid_mode: .word SVGA_MODE -root_dev: .word ROOT_DEV +root_dev: .word 0 /* Filled in by build.c */ boot_flag: .word 0xAA55 # offset 512, entry point @@ -123,14 +119,15 @@ _start: # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch: .word 0, 0 # default_switch, SETUPSEG -start_sys_seg: .word SYSSEG +start_sys_seg: .word SYSSEG # obsolete and meaningless, but just + # in case something decided to "use" it .word kernel_version-512 # pointing to kernel version string # above section of header is compatible # with loadlin-1.5 (header v1.5). Don't # change it. -type_of_loader: .byte 0 # = 0, old one (LILO, Loadlin, - # Bootlin, SYSLX, bootsect...) +type_of_loader: .byte 0 # 0 means ancient bootloader, newer + # bootloaders know to change this. # See Documentation/i386/boot.txt for # assigned ids @@ -142,11 +139,7 @@ CAN_USE_HEAP = 0x80 # If set, the load # space behind setup.S can be used for # heap purposes. # Only the loader knows what is free -#ifndef __BIG_KERNEL__ - .byte 0 -#else .byte LOADED_HIGH -#endif setup_move_size: .word 0x8000 # size to move, when setup is not # loaded at 0x90000. We will move setup @@ -157,11 +150,7 @@ setup_move_size: .word 0x8000 # size t code32_start: # here loaders can put a different # start address for 32-bit code. -#ifndef __BIG_KERNEL__ - .long 0x1000 # 0x1000 = default for zImage -#else .long 0x100000 # 0x100000 = default for big kernel -#endif ramdisk_image: .long 0 # address of loaded ramdisk image # Here the loader puts the 32-bit Index: linux-2.6-tip/arch/x86/boot/main.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/main.c +++ linux-2.6-tip/arch/x86/boot/main.c @@ -149,11 +149,6 @@ void main(void) /* Query MCA information */ query_mca(); - /* Voyager */ -#ifdef CONFIG_X86_VOYAGER - query_voyager(); -#endif - /* Query Intel SpeedStep (IST) information */ query_ist(); Index: linux-2.6-tip/arch/x86/boot/pm.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/pm.c +++ linux-2.6-tip/arch/x86/boot/pm.c @@ -33,47 +33,6 @@ static void realmode_switch_hook(void) } /* - * A zImage kernel is loaded at 0x10000 but wants to run at 0x1000. - * A bzImage kernel is loaded and runs at 0x100000. - */ -static void move_kernel_around(void) -{ - /* Note: rely on the compile-time option here rather than - the LOADED_HIGH flag. The Qemu kernel loader unconditionally - sets the loadflags to zero. */ -#ifndef __BIG_KERNEL__ - u16 dst_seg, src_seg; - u32 syssize; - - dst_seg = 0x1000 >> 4; - src_seg = 0x10000 >> 4; - syssize = boot_params.hdr.syssize; /* Size in 16-byte paragraphs */ - - while (syssize) { - int paras = (syssize >= 0x1000) ? 0x1000 : syssize; - int dwords = paras << 2; - - asm volatile("pushw %%es ; " - "pushw %%ds ; " - "movw %1,%%es ; " - "movw %2,%%ds ; " - "xorw %%di,%%di ; " - "xorw %%si,%%si ; " - "rep;movsl ; " - "popw %%ds ; " - "popw %%es" - : "+c" (dwords) - : "r" (dst_seg), "r" (src_seg) - : "esi", "edi"); - - syssize -= paras; - dst_seg += paras; - src_seg += paras; - } -#endif -} - -/* * Disable all interrupts at the legacy PIC. */ static void mask_all_interrupts(void) @@ -147,9 +106,6 @@ void go_to_protected_mode(void) /* Hook before leaving real mode, also disables interrupts */ realmode_switch_hook(); - /* Move the kernel/setup to their final resting places */ - move_kernel_around(); - /* Enable the A20 gate */ if (enable_a20()) { puts("A20 gate not responding, unable to boot...\n"); Index: linux-2.6-tip/arch/x86/boot/pmjump.S =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/pmjump.S +++ linux-2.6-tip/arch/x86/boot/pmjump.S @@ -15,18 +15,15 @@ #include #include #include +#include .text - - .globl protected_mode_jump - .type protected_mode_jump, @function - .code16 /* * void protected_mode_jump(u32 entrypoint, u32 bootparams); */ -protected_mode_jump: +GLOBAL(protected_mode_jump) movl %edx, %esi # Pointer to boot_params table xorl %ebx, %ebx @@ -47,12 +44,11 @@ protected_mode_jump: .byte 0x66, 0xea # ljmpl opcode 2: .long in_pm32 # offset .word __BOOT_CS # segment - - .size protected_mode_jump, .-protected_mode_jump +ENDPROC(protected_mode_jump) .code32 - .type in_pm32, @function -in_pm32: + .section ".text32","ax" +GLOBAL(in_pm32) # Set up data segments for flat 32-bit mode movl %ecx, %ds movl %ecx, %es @@ -78,5 +74,4 @@ in_pm32: lldt %cx jmpl *%eax # Jump to the 32-bit entrypoint - - .size in_pm32, .-in_pm32 +ENDPROC(in_pm32) Index: linux-2.6-tip/arch/x86/boot/setup.ld =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/setup.ld +++ linux-2.6-tip/arch/x86/boot/setup.ld @@ -17,7 +17,8 @@ SECTIONS .header : { *(.header) } .inittext : { *(.inittext) } .initdata : { *(.initdata) } - .text : { *(.text*) } + .text : { *(.text) } + .text32 : { *(.text32) } . = ALIGN(16); .rodata : { *(.rodata*) } Index: linux-2.6-tip/arch/x86/boot/tools/build.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/tools/build.c +++ linux-2.6-tip/arch/x86/boot/tools/build.c @@ -130,7 +130,7 @@ static void die(const char * str, ...) static void usage(void) { - die("Usage: build [-b] setup system [rootdev] [> image]"); + die("Usage: build setup system [rootdev] [> image]"); } int main(int argc, char ** argv) @@ -145,11 +145,6 @@ int main(int argc, char ** argv) void *kernel; u32 crc = 0xffffffffUL; - if (argc > 2 && !strcmp(argv[1], "-b")) - { - is_big_kernel = 1; - argc--, argv++; - } if ((argc < 3) || (argc > 4)) usage(); if (argc > 3) { @@ -216,8 +211,6 @@ int main(int argc, char ** argv) die("Unable to mmap '%s': %m", argv[2]); /* Number of 16-byte paragraphs, including space for a 4-byte CRC */ sys_size = (sz + 15 + 4) / 16; - if (!is_big_kernel && sys_size > DEF_SYSSIZE) - die("System is too big. Try using bzImage or modules."); /* Patch the setup code with the appropriate size parameters */ buf[0x1f1] = setup_sectors-1; Index: linux-2.6-tip/arch/x86/boot/video-mode.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/video-mode.c +++ linux-2.6-tip/arch/x86/boot/video-mode.c @@ -147,7 +147,7 @@ static void vga_recalc_vertical(void) int set_mode(u16 mode) { int rv; - u16 real_mode; + u16 uninitialized_var(real_mode); /* Very special mode numbers... */ if (mode == VIDEO_CURRENT_MODE) Index: linux-2.6-tip/arch/x86/boot/video-vga.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/video-vga.c +++ linux-2.6-tip/arch/x86/boot/video-vga.c @@ -129,41 +129,45 @@ u16 vga_crtc(void) return (inb(0x3cc) & 1) ? 0x3d4 : 0x3b4; } -static void vga_set_480_scanlines(int end) +static void vga_set_480_scanlines(int lines) { - u16 crtc; - u8 csel; + u16 crtc; /* CRTC base address */ + u8 csel; /* CRTC miscellaneous output register */ + u8 ovfw; /* CRTC overflow register */ + int end = lines-1; crtc = vga_crtc(); + ovfw = 0x3c | ((end >> (8-1)) & 0x02) | ((end >> (9-6)) & 0x40); + out_idx(0x0c, crtc, 0x11); /* Vertical sync end, unlock CR0-7 */ out_idx(0x0b, crtc, 0x06); /* Vertical total */ - out_idx(0x3e, crtc, 0x07); /* Vertical overflow */ + out_idx(ovfw, crtc, 0x07); /* Vertical overflow */ out_idx(0xea, crtc, 0x10); /* Vertical sync start */ - out_idx(end, crtc, 0x12); /* Vertical display end */ + out_idx(end, crtc, 0x12); /* Vertical display end */ out_idx(0xe7, crtc, 0x15); /* Vertical blank start */ out_idx(0x04, crtc, 0x16); /* Vertical blank end */ csel = inb(0x3cc); csel &= 0x0d; csel |= 0xe2; - outb(csel, 0x3cc); + outb(csel, 0x3c2); } static void vga_set_80x30(void) { - vga_set_480_scanlines(0xdf); + vga_set_480_scanlines(30*16); } static void vga_set_80x34(void) { vga_set_14font(); - vga_set_480_scanlines(0xdb); + vga_set_480_scanlines(34*14); } static void vga_set_80x60(void) { vga_set_8font(); - vga_set_480_scanlines(0xdf); + vga_set_480_scanlines(60*8); } static int vga_set_mode(struct mode_info *mode) Index: linux-2.6-tip/arch/x86/boot/voyager.c =================================================================== --- linux-2.6-tip.orig/arch/x86/boot/voyager.c +++ /dev/null @@ -1,40 +0,0 @@ -/* -*- linux-c -*- ------------------------------------------------------- * - * - * Copyright (C) 1991, 1992 Linus Torvalds - * Copyright 2007 rPath, Inc. - All Rights Reserved - * - * This file is part of the Linux kernel, and is made available under - * the terms of the GNU General Public License version 2. - * - * ----------------------------------------------------------------------- */ - -/* - * Get the Voyager config information - */ - -#include "boot.h" - -int query_voyager(void) -{ - u8 err; - u16 es, di; - /* Abuse the apm_bios_info area for this */ - u8 *data_ptr = (u8 *)&boot_params.apm_bios_info; - - data_ptr[0] = 0xff; /* Flag on config not found(?) */ - - asm("pushw %%es ; " - "int $0x15 ; " - "setc %0 ; " - "movw %%es, %1 ; " - "popw %%es" - : "=q" (err), "=r" (es), "=D" (di) - : "a" (0xffc0)); - - if (err) - return -1; /* Not Voyager */ - - set_fs(es); - copy_from_fs(data_ptr, di, 7); /* Table is 7 bytes apparently */ - return 0; -} Index: linux-2.6-tip/arch/x86/configs/i386_defconfig =================================================================== --- linux-2.6-tip.orig/arch/x86/configs/i386_defconfig +++ linux-2.6-tip/arch/x86/configs/i386_defconfig @@ -1,14 +1,13 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.27-rc5 -# Wed Sep 3 17:23:09 2008 +# Linux kernel version: 2.6.29-rc4 +# Tue Feb 24 15:50:58 2009 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" -# CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y @@ -24,16 +23,14 @@ CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y -# CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y -# CONFIG_ARCH_HAS_ILOG2_U32 is not set -# CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y +CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set @@ -42,12 +39,12 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set -CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_SMP=y +CONFIG_USE_GENERIC_SMP_HELPERS=y CONFIG_X86_32_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y @@ -76,30 +73,44 @@ CONFIG_TASK_IO_ACCOUNTING=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y + +# +# RCU Subsystem +# +# CONFIG_CLASSIC_RCU is not set +CONFIG_TREE_RCU=y +# CONFIG_PREEMPT_RCU is not set +# CONFIG_RCU_TRACE is not set +CONFIG_RCU_FANOUT=32 +# CONFIG_RCU_FANOUT_EXACT is not set +# CONFIG_TREE_RCU_TRACE is not set +# CONFIG_PREEMPT_RCU_TRACE is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=18 -CONFIG_CGROUPS=y -# CONFIG_CGROUP_DEBUG is not set -CONFIG_CGROUP_NS=y -# CONFIG_CGROUP_DEVICE is not set -CONFIG_CPUSETS=y CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_USER_SCHED is not set CONFIG_CGROUP_SCHED=y +CONFIG_CGROUPS=y +# CONFIG_CGROUP_DEBUG is not set +CONFIG_CGROUP_NS=y +CONFIG_CGROUP_FREEZER=y +# CONFIG_CGROUP_DEVICE is not set +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_CPUACCT=y CONFIG_RESOURCE_COUNTERS=y # CONFIG_CGROUP_MEM_RES_CTLR is not set # CONFIG_SYSFS_DEPRECATED_V2 is not set -CONFIG_PROC_PID_CPUSET=y CONFIG_RELAY=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y +CONFIG_NET_NS=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y @@ -124,12 +135,15 @@ CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y +CONFIG_AIO=y CONFIG_VM_EVENT_COUNTERS=y +CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROFILING=y +CONFIG_TRACEPOINTS=y CONFIG_MARKERS=y # CONFIG_OPROFILE is not set CONFIG_HAVE_OPROFILE=y @@ -139,15 +153,10 @@ CONFIG_KRETPROBES=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y -# CONFIG_HAVE_ARCH_TRACEHOOK is not set -# CONFIG_HAVE_DMA_ATTRS is not set -CONFIG_USE_GENERIC_SMP_HELPERS=y -# CONFIG_HAVE_CLK is not set -CONFIG_PROC_PAGE_MONITOR=y +CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_GENERIC_DMA_COHERENT=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y -# CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_FORCE_LOAD is not set @@ -155,12 +164,10 @@ CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set -CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_LBD is not set CONFIG_BLK_DEV_IO_TRACE=y -# CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # CONFIG_BLK_DEV_INTEGRITY is not set @@ -176,7 +183,7 @@ CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" -CONFIG_CLASSIC_RCU=y +CONFIG_FREEZER=y # # Processor type and features @@ -186,15 +193,14 @@ CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y +CONFIG_SPARSE_IRQ=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y -CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set -# CONFIG_X86_VOYAGER is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_VSMP is not set # CONFIG_X86_RDC321X is not set -CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y +CONFIG_SCHED_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_MEMTEST is not set # CONFIG_M386 is not set @@ -238,10 +244,19 @@ CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_X86_DEBUGCTLMSR=y +CONFIG_CPU_SUP_INTEL=y +CONFIG_CPU_SUP_CYRIX_32=y +CONFIG_CPU_SUP_AMD=y +CONFIG_CPU_SUP_CENTAUR_32=y +CONFIG_CPU_SUP_TRANSMETA_32=y +CONFIG_CPU_SUP_UMC_32=y +CONFIG_X86_DS=y +CONFIG_X86_PTRACE_BTS=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y # CONFIG_IOMMU_HELPER is not set +# CONFIG_IOMMU_API is not set CONFIG_NR_CPUS=64 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y @@ -250,12 +265,17 @@ CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y -# CONFIG_X86_MCE is not set +CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y +CONFIG_X86_MCE=y +CONFIG_X86_MCE_NONFATAL=y +CONFIG_X86_MCE_P4THERMAL=y CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_X86_REBOOTFIXUPS=y CONFIG_MICROCODE=y +CONFIG_MICROCODE_INTEL=y +CONFIG_MICROCODE_AMD=y CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y @@ -264,6 +284,7 @@ CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_HIGHMEM=y +# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y @@ -274,14 +295,17 @@ CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y -# CONFIG_SPARSEMEM_VMEMMAP_ENABLE is not set CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 -CONFIG_RESOURCES_64BIT=y +# CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y +CONFIG_UNEVICTABLE_LRU=y CONFIG_HIGHPTE=y +CONFIG_X86_CHECK_BIOS_CORRUPTION=y +CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y +CONFIG_X86_RESERVE_LOW_64K=y # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_MTRR_SANITIZER is not set @@ -302,10 +326,11 @@ CONFIG_PHYSICAL_START=0x1000000 CONFIG_PHYSICAL_ALIGN=0x200000 CONFIG_HOTPLUG_CPU=y # CONFIG_COMPAT_VDSO is not set +# CONFIG_CMDLINE_BOOL is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y # -# Power management options +# Power management and ACPI options # CONFIG_PM=y CONFIG_PM_DEBUG=y @@ -331,19 +356,13 @@ CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y -# CONFIG_ACPI_BAY is not set CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_HOTPLUG_CPU=y CONFIG_ACPI_THERMAL=y -# CONFIG_ACPI_WMI is not set -# CONFIG_ACPI_ASUS is not set -# CONFIG_ACPI_TOSHIBA is not set # CONFIG_ACPI_CUSTOM_DSDT is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set -CONFIG_ACPI_EC=y # CONFIG_ACPI_PCI_SLOT is not set -CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_CONTAINER=y @@ -388,7 +407,6 @@ CONFIG_X86_ACPI_CPUFREQ=y # # shared options # -# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set # CONFIG_X86_SPEEDSTEP_LIB is not set CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y @@ -415,6 +433,7 @@ CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y # CONFIG_PCI_LEGACY is not set # CONFIG_PCI_DEBUG is not set +# CONFIG_PCI_STUB is not set CONFIG_HT_IRQ=y CONFIG_ISA_DMA_API=y # CONFIG_ISA is not set @@ -452,13 +471,17 @@ CONFIG_HOTPLUG_PCI=y # Executable file formats / Emulations # CONFIG_BINFMT_ELF=y +CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y +CONFIG_HAVE_AOUT=y # CONFIG_BINFMT_AOUT is not set CONFIG_BINFMT_MISC=y +CONFIG_HAVE_ATOMIC_IOMAP=y CONFIG_NET=y # # Networking options # +CONFIG_COMPAT_NET_DEV_OPS=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y @@ -519,7 +542,6 @@ CONFIG_DEFAULT_CUBIC=y # CONFIG_DEFAULT_RENO is not set CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_TCP_MD5SIG=y -# CONFIG_IP_VS is not set CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set @@ -557,19 +579,21 @@ CONFIG_NF_CONNTRACK_IRC=y CONFIG_NF_CONNTRACK_SIP=y CONFIG_NF_CT_NETLINK=y CONFIG_NETFILTER_XTABLES=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y CONFIG_NETFILTER_XT_TARGET_MARK=y CONFIG_NETFILTER_XT_TARGET_NFLOG=y CONFIG_NETFILTER_XT_TARGET_SECMARK=y -CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y CONFIG_NETFILTER_XT_TARGET_TCPMSS=y CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y CONFIG_NETFILTER_XT_MATCH_MARK=y CONFIG_NETFILTER_XT_MATCH_POLICY=y CONFIG_NETFILTER_XT_MATCH_STATE=y +# CONFIG_IP_VS is not set # # IP: Netfilter Configuration # +CONFIG_NF_DEFRAG_IPV4=y CONFIG_NF_CONNTRACK_IPV4=y CONFIG_NF_CONNTRACK_PROC_COMPAT=y CONFIG_IP_NF_IPTABLES=y @@ -595,8 +619,8 @@ CONFIG_IP_NF_MANGLE=y CONFIG_NF_CONNTRACK_IPV6=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_MATCH_IPV6HEADER=y -CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_LOG=y +CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_REJECT=y CONFIG_IP6_NF_MANGLE=y # CONFIG_IP_DCCP is not set @@ -604,6 +628,7 @@ CONFIG_IP6_NF_MANGLE=y # CONFIG_TIPC is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set +# CONFIG_NET_DSA is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=y @@ -623,6 +648,7 @@ CONFIG_NET_SCHED=y # CONFIG_NET_SCH_HTB is not set # CONFIG_NET_SCH_HFSC is not set # CONFIG_NET_SCH_PRIO is not set +# CONFIG_NET_SCH_MULTIQ is not set # CONFIG_NET_SCH_RED is not set # CONFIG_NET_SCH_SFQ is not set # CONFIG_NET_SCH_TEQL is not set @@ -630,6 +656,7 @@ CONFIG_NET_SCHED=y # CONFIG_NET_SCH_GRED is not set # CONFIG_NET_SCH_DSMARK is not set # CONFIG_NET_SCH_NETEM is not set +# CONFIG_NET_SCH_DRR is not set # CONFIG_NET_SCH_INGRESS is not set # @@ -644,6 +671,7 @@ CONFIG_NET_CLS=y # CONFIG_NET_CLS_RSVP is not set # CONFIG_NET_CLS_RSVP6 is not set # CONFIG_NET_CLS_FLOW is not set +# CONFIG_NET_CLS_CGROUP is not set CONFIG_NET_EMATCH=y CONFIG_NET_EMATCH_STACK=32 # CONFIG_NET_EMATCH_CMP is not set @@ -659,7 +687,9 @@ CONFIG_NET_CLS_ACT=y # CONFIG_NET_ACT_NAT is not set # CONFIG_NET_ACT_PEDIT is not set # CONFIG_NET_ACT_SIMP is not set +# CONFIG_NET_ACT_SKBEDIT is not set CONFIG_NET_SCH_FIFO=y +# CONFIG_DCB is not set # # Network testing @@ -676,29 +706,33 @@ CONFIG_HAMRADIO=y # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_AF_RXRPC is not set +# CONFIG_PHONET is not set CONFIG_FIB_RULES=y - -# -# Wireless -# +CONFIG_WIRELESS=y CONFIG_CFG80211=y +# CONFIG_CFG80211_REG_DEBUG is not set CONFIG_NL80211=y +CONFIG_WIRELESS_OLD_REGULATORY=y CONFIG_WIRELESS_EXT=y CONFIG_WIRELESS_EXT_SYSFS=y +# CONFIG_LIB80211 is not set CONFIG_MAC80211=y # # Rate control algorithm selection # -CONFIG_MAC80211_RC_PID=y -CONFIG_MAC80211_RC_DEFAULT_PID=y -CONFIG_MAC80211_RC_DEFAULT="pid" +CONFIG_MAC80211_RC_MINSTREL=y +# CONFIG_MAC80211_RC_DEFAULT_PID is not set +CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y +CONFIG_MAC80211_RC_DEFAULT="minstrel" # CONFIG_MAC80211_MESH is not set CONFIG_MAC80211_LEDS=y # CONFIG_MAC80211_DEBUGFS is not set # CONFIG_MAC80211_DEBUG_MENU is not set -# CONFIG_IEEE80211 is not set -# CONFIG_RFKILL is not set +# CONFIG_WIMAX is not set +CONFIG_RFKILL=y +# CONFIG_RFKILL_INPUT is not set +CONFIG_RFKILL_LEDS=y # CONFIG_NET_9P is not set # @@ -722,7 +756,7 @@ CONFIG_PROC_EVENTS=y # CONFIG_MTD is not set # CONFIG_PARPORT is not set CONFIG_PNP=y -# CONFIG_PNP_DEBUG is not set +CONFIG_PNP_DEBUG_MESSAGES=y # # Protocols @@ -750,20 +784,19 @@ CONFIG_BLK_DEV_RAM_SIZE=16384 CONFIG_MISC_DEVICES=y # CONFIG_IBM_ASM is not set # CONFIG_PHANTOM is not set -# CONFIG_EEPROM_93CX6 is not set # CONFIG_SGI_IOC4 is not set # CONFIG_TIFM_CORE is not set -# CONFIG_ACER_WMI is not set -# CONFIG_ASUS_LAPTOP is not set -# CONFIG_FUJITSU_LAPTOP is not set -# CONFIG_TC1100_WMI is not set -# CONFIG_MSI_LAPTOP is not set -# CONFIG_COMPAL_LAPTOP is not set -# CONFIG_SONY_LAPTOP is not set -# CONFIG_THINKPAD_ACPI is not set -# CONFIG_INTEL_MENLOW is not set +# CONFIG_ICS932S401 is not set # CONFIG_ENCLOSURE_SERVICES is not set # CONFIG_HP_ILO is not set +# CONFIG_C2PORT is not set + +# +# EEPROM support +# +# CONFIG_EEPROM_AT24 is not set +# CONFIG_EEPROM_LEGACY is not set +# CONFIG_EEPROM_93CX6 is not set CONFIG_HAVE_IDE=y # CONFIG_IDE is not set @@ -802,7 +835,7 @@ CONFIG_SCSI_WAIT_SCAN=m # CONFIG_SCSI_SPI_ATTRS=y # CONFIG_SCSI_FC_ATTRS is not set -CONFIG_SCSI_ISCSI_ATTRS=y +# CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_ATTRS is not set # CONFIG_SCSI_SAS_LIBSAS is not set # CONFIG_SCSI_SRP_ATTRS is not set @@ -875,6 +908,7 @@ CONFIG_PATA_OLDPIIX=y CONFIG_PATA_SCH=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y +CONFIG_MD_AUTODETECT=y # CONFIG_MD_LINEAR is not set # CONFIG_MD_RAID0 is not set # CONFIG_MD_RAID1 is not set @@ -930,6 +964,9 @@ CONFIG_PHYLIB=y # CONFIG_BROADCOM_PHY is not set # CONFIG_ICPLUS_PHY is not set # CONFIG_REALTEK_PHY is not set +# CONFIG_NATIONAL_PHY is not set +# CONFIG_STE10XP is not set +# CONFIG_LSI_ET1011C_PHY is not set # CONFIG_FIXED_PHY is not set # CONFIG_MDIO_BITBANG is not set CONFIG_NET_ETHERNET=y @@ -953,6 +990,9 @@ CONFIG_NET_TULIP=y # CONFIG_IBM_NEW_EMAC_RGMII is not set # CONFIG_IBM_NEW_EMAC_TAH is not set # CONFIG_IBM_NEW_EMAC_EMAC4 is not set +# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set +# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set +# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set @@ -960,7 +1000,6 @@ CONFIG_NET_PCI=y # CONFIG_B44 is not set CONFIG_FORCEDETH=y # CONFIG_FORCEDETH_NAPI is not set -# CONFIG_EEPRO100 is not set CONFIG_E100=y # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set @@ -974,15 +1013,16 @@ CONFIG_8139TOO=y # CONFIG_R6040 is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set +# CONFIG_SMSC9420 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set # CONFIG_VIA_RHINE is not set # CONFIG_SC92031 is not set +# CONFIG_ATL2 is not set CONFIG_NETDEV_1000=y # CONFIG_ACENIC is not set # CONFIG_DL2K is not set CONFIG_E1000=y -# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set CONFIG_E1000E=y # CONFIG_IP1000 is not set # CONFIG_IGB is not set @@ -1000,18 +1040,23 @@ CONFIG_BNX2=y # CONFIG_QLA3XXX is not set # CONFIG_ATL1 is not set # CONFIG_ATL1E is not set +# CONFIG_JME is not set CONFIG_NETDEV_10000=y # CONFIG_CHELSIO_T1 is not set +CONFIG_CHELSIO_T3_DEPENDS=y # CONFIG_CHELSIO_T3 is not set +# CONFIG_ENIC is not set # CONFIG_IXGBE is not set # CONFIG_IXGB is not set # CONFIG_S2IO is not set # CONFIG_MYRI10GE is not set # CONFIG_NETXEN_NIC is not set # CONFIG_NIU is not set +# CONFIG_MLX4_EN is not set # CONFIG_MLX4_CORE is not set # CONFIG_TEHUTI is not set # CONFIG_BNX2X is not set +# CONFIG_QLGE is not set # CONFIG_SFC is not set CONFIG_TR=y # CONFIG_IBMOL is not set @@ -1025,9 +1070,8 @@ CONFIG_TR=y # CONFIG_WLAN_PRE80211 is not set CONFIG_WLAN_80211=y # CONFIG_PCMCIA_RAYCS is not set -# CONFIG_IPW2100 is not set -# CONFIG_IPW2200 is not set # CONFIG_LIBERTAS is not set +# CONFIG_LIBERTAS_THINFIRM is not set # CONFIG_AIRO is not set # CONFIG_HERMES is not set # CONFIG_ATMEL is not set @@ -1044,6 +1088,8 @@ CONFIG_WLAN_80211=y CONFIG_ATH5K=y # CONFIG_ATH5K_DEBUG is not set # CONFIG_ATH9K is not set +# CONFIG_IPW2100 is not set +# CONFIG_IPW2200 is not set # CONFIG_IWLCORE is not set # CONFIG_IWLWIFI_LEDS is not set # CONFIG_IWLAGN is not set @@ -1055,6 +1101,10 @@ CONFIG_ATH5K=y # CONFIG_RT2X00 is not set # +# Enable WiMAX (Networking options) to see the WiMAX drivers +# + +# # USB Network Adapters # # CONFIG_USB_CATC is not set @@ -1062,6 +1112,7 @@ CONFIG_ATH5K=y # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set +# CONFIG_USB_HSO is not set CONFIG_NET_PCMCIA=y # CONFIG_PCMCIA_3C589 is not set # CONFIG_PCMCIA_3C574 is not set @@ -1123,6 +1174,7 @@ CONFIG_MOUSE_PS2_LOGIPS2PP=y CONFIG_MOUSE_PS2_SYNAPTICS=y CONFIG_MOUSE_PS2_LIFEBOOK=y CONFIG_MOUSE_PS2_TRACKPOINT=y +# CONFIG_MOUSE_PS2_ELANTECH is not set # CONFIG_MOUSE_PS2_TOUCHKIT is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_APPLETOUCH is not set @@ -1160,15 +1212,16 @@ CONFIG_INPUT_TOUCHSCREEN=y # CONFIG_TOUCHSCREEN_FUJITSU is not set # CONFIG_TOUCHSCREEN_GUNZE is not set # CONFIG_TOUCHSCREEN_ELO is not set +# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set # CONFIG_TOUCHSCREEN_MTOUCH is not set # CONFIG_TOUCHSCREEN_INEXIO is not set # CONFIG_TOUCHSCREEN_MK712 is not set # CONFIG_TOUCHSCREEN_PENMOUNT is not set # CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set # CONFIG_TOUCHSCREEN_TOUCHWIN is not set -# CONFIG_TOUCHSCREEN_UCB1400 is not set # CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set # CONFIG_TOUCHSCREEN_TOUCHIT213 is not set +# CONFIG_TOUCHSCREEN_TSC2007 is not set CONFIG_INPUT_MISC=y # CONFIG_INPUT_PCSPKR is not set # CONFIG_INPUT_APANEL is not set @@ -1179,6 +1232,7 @@ CONFIG_INPUT_MISC=y # CONFIG_INPUT_KEYSPAN_REMOTE is not set # CONFIG_INPUT_POWERMATE is not set # CONFIG_INPUT_YEALINK is not set +# CONFIG_INPUT_CM109 is not set # CONFIG_INPUT_UINPUT is not set # @@ -1245,6 +1299,7 @@ CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y +# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set # CONFIG_LEGACY_PTYS is not set # CONFIG_IPMI_HANDLER is not set CONFIG_HW_RANDOM=y @@ -1279,6 +1334,7 @@ CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y # CONFIG_I2C_CHARDEV is not set CONFIG_I2C_HELPER_AUTO=y +CONFIG_I2C_ALGOBIT=y # # I2C Hardware Bus support @@ -1331,8 +1387,6 @@ CONFIG_I2C_I801=y # Miscellaneous I2C Chip support # # CONFIG_DS1682 is not set -# CONFIG_EEPROM_AT24 is not set -# CONFIG_EEPROM_LEGACY is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_PCF8575 is not set # CONFIG_SENSORS_PCA9539 is not set @@ -1351,8 +1405,78 @@ CONFIG_POWER_SUPPLY=y # CONFIG_POWER_SUPPLY_DEBUG is not set # CONFIG_PDA_POWER is not set # CONFIG_BATTERY_DS2760 is not set -# CONFIG_HWMON is not set +# CONFIG_BATTERY_BQ27x00 is not set +CONFIG_HWMON=y +# CONFIG_HWMON_VID is not set +# CONFIG_SENSORS_ABITUGURU is not set +# CONFIG_SENSORS_ABITUGURU3 is not set +# CONFIG_SENSORS_AD7414 is not set +# CONFIG_SENSORS_AD7418 is not set +# CONFIG_SENSORS_ADM1021 is not set +# CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set +# CONFIG_SENSORS_ADM1029 is not set +# CONFIG_SENSORS_ADM1031 is not set +# CONFIG_SENSORS_ADM9240 is not set +# CONFIG_SENSORS_ADT7462 is not set +# CONFIG_SENSORS_ADT7470 is not set +# CONFIG_SENSORS_ADT7473 is not set +# CONFIG_SENSORS_ADT7475 is not set +# CONFIG_SENSORS_K8TEMP is not set +# CONFIG_SENSORS_ASB100 is not set +# CONFIG_SENSORS_ATXP1 is not set +# CONFIG_SENSORS_DS1621 is not set +# CONFIG_SENSORS_I5K_AMB is not set +# CONFIG_SENSORS_F71805F is not set +# CONFIG_SENSORS_F71882FG is not set +# CONFIG_SENSORS_F75375S is not set +# CONFIG_SENSORS_FSCHER is not set +# CONFIG_SENSORS_FSCPOS is not set +# CONFIG_SENSORS_FSCHMD is not set +# CONFIG_SENSORS_GL518SM is not set +# CONFIG_SENSORS_GL520SM is not set +# CONFIG_SENSORS_CORETEMP is not set +# CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set +# CONFIG_SENSORS_LM75 is not set +# CONFIG_SENSORS_LM77 is not set +# CONFIG_SENSORS_LM78 is not set +# CONFIG_SENSORS_LM80 is not set +# CONFIG_SENSORS_LM83 is not set +# CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set +# CONFIG_SENSORS_LM90 is not set +# CONFIG_SENSORS_LM92 is not set +# CONFIG_SENSORS_LM93 is not set +# CONFIG_SENSORS_LTC4245 is not set +# CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_MAX6650 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_PC87427 is not set +# CONFIG_SENSORS_SIS5595 is not set +# CONFIG_SENSORS_DME1737 is not set +# CONFIG_SENSORS_SMSC47M1 is not set +# CONFIG_SENSORS_SMSC47M192 is not set +# CONFIG_SENSORS_SMSC47B397 is not set +# CONFIG_SENSORS_ADS7828 is not set +# CONFIG_SENSORS_THMC50 is not set +# CONFIG_SENSORS_VIA686A is not set +# CONFIG_SENSORS_VT1211 is not set +# CONFIG_SENSORS_VT8231 is not set +# CONFIG_SENSORS_W83781D is not set +# CONFIG_SENSORS_W83791D is not set +# CONFIG_SENSORS_W83792D is not set +# CONFIG_SENSORS_W83793 is not set +# CONFIG_SENSORS_W83L785TS is not set +# CONFIG_SENSORS_W83L786NG is not set +# CONFIG_SENSORS_W83627HF is not set +# CONFIG_SENSORS_W83627EHF is not set +# CONFIG_SENSORS_HDAPS is not set +# CONFIG_SENSORS_LIS3LV02D is not set +# CONFIG_SENSORS_APPLESMC is not set +# CONFIG_HWMON_DEBUG_CHIP is not set CONFIG_THERMAL=y +# CONFIG_THERMAL_HWMON is not set CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set @@ -1372,6 +1496,7 @@ CONFIG_WATCHDOG=y # CONFIG_I6300ESB_WDT is not set # CONFIG_ITCO_WDT is not set # CONFIG_IT8712F_WDT is not set +# CONFIG_IT87_WDT is not set # CONFIG_HP_WATCHDOG is not set # CONFIG_SC1200_WDT is not set # CONFIG_PC87413_WDT is not set @@ -1379,9 +1504,11 @@ CONFIG_WATCHDOG=y # CONFIG_SBC8360_WDT is not set # CONFIG_SBC7240_WDT is not set # CONFIG_CPU5_WDT is not set +# CONFIG_SMSC_SCH311X_WDT is not set # CONFIG_SMSC37B787_WDT is not set # CONFIG_W83627HF_WDT is not set # CONFIG_W83697HF_WDT is not set +# CONFIG_W83697UG_WDT is not set # CONFIG_W83877F_WDT is not set # CONFIG_W83977F_WDT is not set # CONFIG_MACHZ_WDT is not set @@ -1397,11 +1524,11 @@ CONFIG_WATCHDOG=y # USB-based Watchdog Cards # # CONFIG_USBPCWATCHDOG is not set +CONFIG_SSB_POSSIBLE=y # # Sonics Silicon Backplane # -CONFIG_SSB_POSSIBLE=y # CONFIG_SSB is not set # @@ -1410,7 +1537,13 @@ CONFIG_SSB_POSSIBLE=y # CONFIG_MFD_CORE is not set # CONFIG_MFD_SM501 is not set # CONFIG_HTC_PASIC3 is not set +# CONFIG_TWL4030_CORE is not set # CONFIG_MFD_TMIO is not set +# CONFIG_PMIC_DA903X is not set +# CONFIG_MFD_WM8400 is not set +# CONFIG_MFD_WM8350_I2C is not set +# CONFIG_MFD_PCF50633 is not set +# CONFIG_REGULATOR is not set # # Multimedia devices @@ -1450,6 +1583,7 @@ CONFIG_DRM=y # CONFIG_DRM_I810 is not set # CONFIG_DRM_I830 is not set CONFIG_DRM_I915=y +# CONFIG_DRM_I915_KMS is not set # CONFIG_DRM_MGA is not set # CONFIG_DRM_SIS is not set # CONFIG_DRM_VIA is not set @@ -1459,6 +1593,7 @@ CONFIG_DRM_I915=y CONFIG_FB=y # CONFIG_FIRMWARE_EDID is not set # CONFIG_FB_DDC is not set +# CONFIG_FB_BOOT_VESA_SUPPORT is not set CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y @@ -1487,7 +1622,6 @@ CONFIG_FB_TILEBLITTING=y # CONFIG_FB_UVESA is not set # CONFIG_FB_VESA is not set CONFIG_FB_EFI=y -# CONFIG_FB_IMAC is not set # CONFIG_FB_N411 is not set # CONFIG_FB_HGA is not set # CONFIG_FB_S1D13XXX is not set @@ -1503,6 +1637,7 @@ CONFIG_FB_EFI=y # CONFIG_FB_S3 is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set +# CONFIG_FB_VIA is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set @@ -1515,12 +1650,15 @@ CONFIG_FB_EFI=y # CONFIG_FB_CARMINE is not set # CONFIG_FB_GEODE is not set # CONFIG_FB_VIRTUAL is not set +# CONFIG_FB_METRONOME is not set +# CONFIG_FB_MB862XX is not set CONFIG_BACKLIGHT_LCD_SUPPORT=y # CONFIG_LCD_CLASS_DEVICE is not set CONFIG_BACKLIGHT_CLASS_DEVICE=y -# CONFIG_BACKLIGHT_CORGI is not set +CONFIG_BACKLIGHT_GENERIC=y # CONFIG_BACKLIGHT_PROGEAR is not set # CONFIG_BACKLIGHT_MBP_NVIDIA is not set +# CONFIG_BACKLIGHT_SAHARA is not set # # Display device support @@ -1540,10 +1678,12 @@ CONFIG_LOGO=y # CONFIG_LOGO_LINUX_VGA16 is not set CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=y +CONFIG_SOUND_OSS_CORE=y CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y +CONFIG_SND_JACK=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y @@ -1551,6 +1691,8 @@ CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y CONFIG_SND_PCM_OSS_PLUGINS=y CONFIG_SND_SEQUENCER_OSS=y +CONFIG_SND_HRTIMER=y +CONFIG_SND_SEQ_HRTIMER_DEFAULT=y CONFIG_SND_DYNAMIC_MINORS=y CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_VERBOSE_PROCFS=y @@ -1605,11 +1747,16 @@ CONFIG_SND_PCI=y # CONFIG_SND_FM801 is not set CONFIG_SND_HDA_INTEL=y CONFIG_SND_HDA_HWDEP=y +# CONFIG_SND_HDA_RECONFIG is not set +# CONFIG_SND_HDA_INPUT_BEEP is not set CONFIG_SND_HDA_CODEC_REALTEK=y CONFIG_SND_HDA_CODEC_ANALOG=y CONFIG_SND_HDA_CODEC_SIGMATEL=y CONFIG_SND_HDA_CODEC_VIA=y CONFIG_SND_HDA_CODEC_ATIHDMI=y +CONFIG_SND_HDA_CODEC_NVHDMI=y +CONFIG_SND_HDA_CODEC_INTELHDMI=y +CONFIG_SND_HDA_ELD=y CONFIG_SND_HDA_CODEC_CONEXANT=y CONFIG_SND_HDA_CODEC_CMEDIA=y CONFIG_SND_HDA_CODEC_SI3054=y @@ -1643,6 +1790,7 @@ CONFIG_SND_USB=y # CONFIG_SND_USB_AUDIO is not set # CONFIG_SND_USB_USX2Y is not set # CONFIG_SND_USB_CAIAQ is not set +# CONFIG_SND_USB_US122L is not set CONFIG_SND_PCMCIA=y # CONFIG_SND_VXPOCKET is not set # CONFIG_SND_PDAUDIOCF is not set @@ -1657,15 +1805,37 @@ CONFIG_HIDRAW=y # USB Input Devices # CONFIG_USB_HID=y -CONFIG_USB_HIDINPUT_POWERBOOK=y -CONFIG_HID_FF=y CONFIG_HID_PID=y +CONFIG_USB_HIDDEV=y + +# +# Special HID drivers +# +CONFIG_HID_COMPAT=y +CONFIG_HID_A4TECH=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_EZKEY=y +CONFIG_HID_GYRATION=y +CONFIG_HID_LOGITECH=y CONFIG_LOGITECH_FF=y # CONFIG_LOGIRUMBLEPAD2_FF is not set +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_NTRIG=y +CONFIG_HID_PANTHERLORD=y CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SUNPLUS=y +# CONFIG_GREENASIA_FF is not set +CONFIG_HID_TOPSEED=y CONFIG_THRUSTMASTER_FF=y CONFIG_ZEROPLUS_FF=y -CONFIG_USB_HIDDEV=y CONFIG_USB_SUPPORT=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y @@ -1683,6 +1853,8 @@ CONFIG_USB_DEVICEFS=y CONFIG_USB_SUSPEND=y # CONFIG_USB_OTG is not set CONFIG_USB_MON=y +# CONFIG_USB_WUSB is not set +# CONFIG_USB_WUSB_CBAF is not set # # USB Host Controller Drivers @@ -1691,6 +1863,7 @@ CONFIG_USB_MON=y CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_EHCI_TT_NEWSCHED is not set +# CONFIG_USB_OXU210HP_HCD is not set # CONFIG_USB_ISP116X_HCD is not set # CONFIG_USB_ISP1760_HCD is not set CONFIG_USB_OHCI_HCD=y @@ -1700,6 +1873,8 @@ CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_UHCI_HCD=y # CONFIG_USB_SL811_HCD is not set # CONFIG_USB_R8A66597_HCD is not set +# CONFIG_USB_WHCI_HCD is not set +# CONFIG_USB_HWA_HCD is not set # # USB Device Class drivers @@ -1707,20 +1882,20 @@ CONFIG_USB_UHCI_HCD=y # CONFIG_USB_ACM is not set CONFIG_USB_PRINTER=y # CONFIG_USB_WDM is not set +# CONFIG_USB_TMC is not set # -# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed; # # -# may also be needed; see USB_STORAGE Help for more information +# see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set -# CONFIG_USB_STORAGE_DPCM is not set # CONFIG_USB_STORAGE_USBAT is not set # CONFIG_USB_STORAGE_SDDR09 is not set # CONFIG_USB_STORAGE_SDDR55 is not set @@ -1728,7 +1903,6 @@ CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_ALAUDA is not set # CONFIG_USB_STORAGE_ONETOUCH is not set # CONFIG_USB_STORAGE_KARMA is not set -# CONFIG_USB_STORAGE_SIERRA is not set # CONFIG_USB_STORAGE_CYPRESS_ATACB is not set CONFIG_USB_LIBUSUAL=y @@ -1749,6 +1923,7 @@ CONFIG_USB_LIBUSUAL=y # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_ADUTUX is not set +# CONFIG_USB_SEVSEG is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set @@ -1766,7 +1941,13 @@ CONFIG_USB_LIBUSUAL=y # CONFIG_USB_IOWARRIOR is not set # CONFIG_USB_TEST is not set # CONFIG_USB_ISIGHTFW is not set +# CONFIG_USB_VST is not set # CONFIG_USB_GADGET is not set + +# +# OTG and related infrastructure +# +# CONFIG_UWB is not set # CONFIG_MMC is not set # CONFIG_MEMSTICK is not set CONFIG_NEW_LEDS=y @@ -1775,6 +1956,7 @@ CONFIG_LEDS_CLASS=y # # LED drivers # +# CONFIG_LEDS_ALIX2 is not set # CONFIG_LEDS_PCA9532 is not set # CONFIG_LEDS_CLEVO_MAIL is not set # CONFIG_LEDS_PCA955X is not set @@ -1785,6 +1967,7 @@ CONFIG_LEDS_CLASS=y CONFIG_LEDS_TRIGGERS=y # CONFIG_LEDS_TRIGGER_TIMER is not set # CONFIG_LEDS_TRIGGER_HEARTBEAT is not set +# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set # CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set # CONFIG_ACCESSIBILITY is not set # CONFIG_INFINIBAND is not set @@ -1824,6 +2007,7 @@ CONFIG_RTC_INTF_DEV=y # CONFIG_RTC_DRV_M41T80 is not set # CONFIG_RTC_DRV_S35390A is not set # CONFIG_RTC_DRV_FM3130 is not set +# CONFIG_RTC_DRV_RX8581 is not set # # SPI RTC drivers @@ -1833,12 +2017,15 @@ CONFIG_RTC_INTF_DEV=y # Platform RTC drivers # CONFIG_RTC_DRV_CMOS=y +# CONFIG_RTC_DRV_DS1286 is not set # CONFIG_RTC_DRV_DS1511 is not set # CONFIG_RTC_DRV_DS1553 is not set # CONFIG_RTC_DRV_DS1742 is not set # CONFIG_RTC_DRV_STK17TA8 is not set # CONFIG_RTC_DRV_M48T86 is not set +# CONFIG_RTC_DRV_M48T35 is not set # CONFIG_RTC_DRV_M48T59 is not set +# CONFIG_RTC_DRV_BQ4802 is not set # CONFIG_RTC_DRV_V3020 is not set # @@ -1851,6 +2038,22 @@ CONFIG_DMADEVICES=y # # CONFIG_INTEL_IOATDMA is not set # CONFIG_UIO is not set +# CONFIG_STAGING is not set +CONFIG_X86_PLATFORM_DEVICES=y +# CONFIG_ACER_WMI is not set +# CONFIG_ASUS_LAPTOP is not set +# CONFIG_FUJITSU_LAPTOP is not set +# CONFIG_TC1100_WMI is not set +# CONFIG_MSI_LAPTOP is not set +# CONFIG_PANASONIC_LAPTOP is not set +# CONFIG_COMPAL_LAPTOP is not set +# CONFIG_SONY_LAPTOP is not set +# CONFIG_THINKPAD_ACPI is not set +# CONFIG_INTEL_MENLOW is not set +CONFIG_EEEPC_LAPTOP=y +# CONFIG_ACPI_WMI is not set +# CONFIG_ACPI_ASUS is not set +# CONFIG_ACPI_TOSHIBA is not set # # Firmware Drivers @@ -1861,8 +2064,7 @@ CONFIG_EFI_VARS=y # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_DMIID=y -CONFIG_ISCSI_IBFT_FIND=y -CONFIG_ISCSI_IBFT=y +# CONFIG_ISCSI_IBFT_FIND is not set # # File systems @@ -1872,21 +2074,24 @@ CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y -# CONFIG_EXT4DEV_FS is not set +# CONFIG_EXT4_FS is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y +CONFIG_FILE_LOCKING=y # CONFIG_XFS_FS is not set # CONFIG_OCFS2_FS is not set +# CONFIG_BTRFS_FS is not set CONFIG_DNOTIFY=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_QUOTA=y CONFIG_QUOTA_NETLINK_INTERFACE=y # CONFIG_PRINT_QUOTA_WARNING is not set +CONFIG_QUOTA_TREE=y # CONFIG_QFMT_V1 is not set CONFIG_QFMT_V2=y CONFIG_QUOTACTL=y @@ -1920,16 +2125,14 @@ CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_VMCORE=y CONFIG_PROC_SYSCTL=y +CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y # CONFIG_CONFIGFS_FS is not set - -# -# Miscellaneous filesystems -# +CONFIG_MISC_FILESYSTEMS=y # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_ECRYPT_FS is not set @@ -1939,6 +2142,7 @@ CONFIG_HUGETLB_PAGE=y # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set +# CONFIG_SQUASHFS is not set # CONFIG_VXFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_OMFS_FS is not set @@ -1960,6 +2164,7 @@ CONFIG_NFS_ACL_SUPPORT=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y +# CONFIG_SUNRPC_REGISTER_V4 is not set CONFIG_RPCSEC_GSS_KRB5=y # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set @@ -2036,7 +2241,7 @@ CONFIG_NLS_UTF8=y # CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y -CONFIG_ENABLE_WARN_DEPRECATED=y +# CONFIG_ENABLE_WARN_DEPRECATED is not set CONFIG_ENABLE_MUST_CHECK=y CONFIG_FRAME_WARN=2048 CONFIG_MAGIC_SYSRQ=y @@ -2066,33 +2271,54 @@ CONFIG_TIMER_STATS=y CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_VM is not set +# CONFIG_DEBUG_VIRTUAL is not set # CONFIG_DEBUG_WRITECOUNT is not set CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_LIST is not set # CONFIG_DEBUG_SG is not set +# CONFIG_DEBUG_NOTIFIERS is not set +CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_FRAME_POINTER=y # CONFIG_BOOT_PRINTK_DELAY is not set # CONFIG_RCU_TORTURE_TEST is not set +# CONFIG_RCU_CPU_STALL_DETECTOR is not set # CONFIG_KPROBES_SANITY_TEST is not set # CONFIG_BACKTRACE_SELF_TEST is not set +# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set # CONFIG_LKDTM is not set # CONFIG_FAULT_INJECTION is not set # CONFIG_LATENCYTOP is not set CONFIG_SYSCTL_SYSCALL_CHECK=y -CONFIG_HAVE_FTRACE=y +CONFIG_USER_STACKTRACE_SUPPORT=y +CONFIG_HAVE_FUNCTION_TRACER=y +CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y +CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y -# CONFIG_FTRACE is not set +CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y +CONFIG_HAVE_HW_BRANCH_TRACER=y + +# +# Tracers +# +# CONFIG_FUNCTION_TRACER is not set # CONFIG_IRQSOFF_TRACER is not set # CONFIG_SYSPROF_TRACER is not set # CONFIG_SCHED_TRACER is not set # CONFIG_CONTEXT_SWITCH_TRACER is not set +# CONFIG_BOOT_TRACER is not set +# CONFIG_TRACE_BRANCH_PROFILING is not set +# CONFIG_POWER_TRACER is not set +# CONFIG_STACK_TRACER is not set +# CONFIG_HW_BRANCH_TRACER is not set CONFIG_PROVIDE_OHCI1394_DMA_INIT=y +# CONFIG_DYNAMIC_PRINTK_DEBUG is not set # CONFIG_SAMPLES is not set CONFIG_HAVE_ARCH_KGDB=y # CONFIG_KGDB is not set # CONFIG_STRICT_DEVMEM is not set CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y +CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUG_PAGEALLOC is not set @@ -2123,8 +2349,10 @@ CONFIG_OPTIMIZE_INLINING=y CONFIG_KEYS=y CONFIG_KEYS_DEBUG_PROC_KEYS=y CONFIG_SECURITY=y +# CONFIG_SECURITYFS is not set CONFIG_SECURITY_NETWORK=y # CONFIG_SECURITY_NETWORK_XFRM is not set +# CONFIG_SECURITY_PATH is not set CONFIG_SECURITY_FILE_CAPABILITIES=y # CONFIG_SECURITY_ROOTPLUG is not set CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=65536 @@ -2135,7 +2363,6 @@ CONFIG_SECURITY_SELINUX_DISABLE=y CONFIG_SECURITY_SELINUX_DEVELOP=y CONFIG_SECURITY_SELINUX_AVC_STATS=y CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1 -# CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT is not set # CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set # CONFIG_SECURITY_SMACK is not set CONFIG_CRYPTO=y @@ -2143,11 +2370,18 @@ CONFIG_CRYPTO=y # # Crypto core or helper # +# CONFIG_CRYPTO_FIPS is not set CONFIG_CRYPTO_ALGAPI=y +CONFIG_CRYPTO_ALGAPI2=y CONFIG_CRYPTO_AEAD=y +CONFIG_CRYPTO_AEAD2=y CONFIG_CRYPTO_BLKCIPHER=y +CONFIG_CRYPTO_BLKCIPHER2=y CONFIG_CRYPTO_HASH=y +CONFIG_CRYPTO_HASH2=y +CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_MANAGER=y +CONFIG_CRYPTO_MANAGER2=y # CONFIG_CRYPTO_GF128MUL is not set # CONFIG_CRYPTO_NULL is not set # CONFIG_CRYPTO_CRYPTD is not set @@ -2182,6 +2416,7 @@ CONFIG_CRYPTO_HMAC=y # Digest # # CONFIG_CRYPTO_CRC32C is not set +# CONFIG_CRYPTO_CRC32C_INTEL is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=y # CONFIG_CRYPTO_MICHAEL_MIC is not set @@ -2222,6 +2457,11 @@ CONFIG_CRYPTO_DES=y # # CONFIG_CRYPTO_DEFLATE is not set # CONFIG_CRYPTO_LZO is not set + +# +# Random Number Generation +# +# CONFIG_CRYPTO_ANSI_CPRNG is not set CONFIG_CRYPTO_HW=y # CONFIG_CRYPTO_DEV_PADLOCK is not set # CONFIG_CRYPTO_DEV_GEODE is not set @@ -2239,6 +2479,7 @@ CONFIG_VIRTUALIZATION=y CONFIG_BITREVERSE=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_FIND_NEXT_BIT=y +CONFIG_GENERIC_FIND_LAST_BIT=y # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC_T10DIF=y Index: linux-2.6-tip/arch/x86/configs/x86_64_defconfig =================================================================== --- linux-2.6-tip.orig/arch/x86/configs/x86_64_defconfig +++ linux-2.6-tip/arch/x86/configs/x86_64_defconfig @@ -1,14 +1,13 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.27-rc5 -# Wed Sep 3 17:13:39 2008 +# Linux kernel version: 2.6.29-rc4 +# Tue Feb 24 15:44:16 2009 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" -# CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y @@ -23,17 +22,16 @@ CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y +CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y -# CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set -# CONFIG_ARCH_HAS_ILOG2_U32 is not set -# CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ARCH_HAS_CPU_RELAX=y +CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y @@ -42,12 +40,12 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y -CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_SMP=y +CONFIG_USE_GENERIC_SMP_HELPERS=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y @@ -76,30 +74,44 @@ CONFIG_TASK_IO_ACCOUNTING=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y + +# +# RCU Subsystem +# +# CONFIG_CLASSIC_RCU is not set +CONFIG_TREE_RCU=y +# CONFIG_PREEMPT_RCU is not set +# CONFIG_RCU_TRACE is not set +CONFIG_RCU_FANOUT=64 +# CONFIG_RCU_FANOUT_EXACT is not set +# CONFIG_TREE_RCU_TRACE is not set +# CONFIG_PREEMPT_RCU_TRACE is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=18 -CONFIG_CGROUPS=y -# CONFIG_CGROUP_DEBUG is not set -CONFIG_CGROUP_NS=y -# CONFIG_CGROUP_DEVICE is not set -CONFIG_CPUSETS=y CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_USER_SCHED is not set CONFIG_CGROUP_SCHED=y +CONFIG_CGROUPS=y +# CONFIG_CGROUP_DEBUG is not set +CONFIG_CGROUP_NS=y +CONFIG_CGROUP_FREEZER=y +# CONFIG_CGROUP_DEVICE is not set +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_CPUACCT=y CONFIG_RESOURCE_COUNTERS=y # CONFIG_CGROUP_MEM_RES_CTLR is not set # CONFIG_SYSFS_DEPRECATED_V2 is not set -CONFIG_PROC_PID_CPUSET=y CONFIG_RELAY=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y +CONFIG_NET_NS=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y @@ -124,12 +136,15 @@ CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y +CONFIG_AIO=y CONFIG_VM_EVENT_COUNTERS=y +CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROFILING=y +CONFIG_TRACEPOINTS=y CONFIG_MARKERS=y # CONFIG_OPROFILE is not set CONFIG_HAVE_OPROFILE=y @@ -139,15 +154,10 @@ CONFIG_KRETPROBES=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y -# CONFIG_HAVE_ARCH_TRACEHOOK is not set -# CONFIG_HAVE_DMA_ATTRS is not set -CONFIG_USE_GENERIC_SMP_HELPERS=y -# CONFIG_HAVE_CLK is not set -CONFIG_PROC_PAGE_MONITOR=y +CONFIG_HAVE_ARCH_TRACEHOOK=y # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y -# CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_FORCE_LOAD is not set @@ -155,7 +165,6 @@ CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set -CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y @@ -175,7 +184,7 @@ CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" -CONFIG_CLASSIC_RCU=y +CONFIG_FREEZER=y # # Processor type and features @@ -185,13 +194,14 @@ CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y +CONFIG_SPARSE_IRQ=y +# CONFIG_NUMA_MIGRATE_IRQ_DESC is not set CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y -CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set -# CONFIG_X86_VOYAGER is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_VSMP is not set +CONFIG_SCHED_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_MEMTEST is not set # CONFIG_M386 is not set @@ -230,6 +240,11 @@ CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=64 CONFIG_X86_DEBUGCTLMSR=y +CONFIG_CPU_SUP_INTEL=y +CONFIG_CPU_SUP_AMD=y +CONFIG_CPU_SUP_CENTAUR_64=y +CONFIG_X86_DS=y +CONFIG_X86_PTRACE_BTS=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y @@ -237,8 +252,11 @@ CONFIG_GART_IOMMU=y CONFIG_CALGARY_IOMMU=y CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y CONFIG_AMD_IOMMU=y +CONFIG_AMD_IOMMU_STATS=y CONFIG_SWIOTLB=y CONFIG_IOMMU_HELPER=y +CONFIG_IOMMU_API=y +# CONFIG_MAXSMP is not set CONFIG_NR_CPUS=64 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y @@ -247,12 +265,19 @@ CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y -# CONFIG_X86_MCE is not set +CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y +CONFIG_X86_MCE=y +CONFIG_X86_MCE_INTEL=y +CONFIG_X86_MCE_AMD=y # CONFIG_I8K is not set CONFIG_MICROCODE=y +CONFIG_MICROCODE_INTEL=y +CONFIG_MICROCODE_AMD=y CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y +CONFIG_ARCH_PHYS_ADDR_T_64BIT=y +CONFIG_DIRECT_GBPAGES=y CONFIG_NUMA=y CONFIG_K8_NUMA=y CONFIG_X86_64_ACPI_NUMA=y @@ -269,7 +294,6 @@ CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_HAVE_MEMORY_PRESENT=y -# CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y @@ -280,10 +304,14 @@ CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y -CONFIG_RESOURCES_64BIT=y +CONFIG_PHYS_ADDR_T_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y +CONFIG_UNEVICTABLE_LRU=y +CONFIG_X86_CHECK_BIOS_CORRUPTION=y +CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y +CONFIG_X86_RESERVE_LOW_64K=y CONFIG_MTRR=y # CONFIG_MTRR_SANITIZER is not set CONFIG_X86_PAT=y @@ -302,11 +330,12 @@ CONFIG_PHYSICAL_START=0x1000000 CONFIG_PHYSICAL_ALIGN=0x200000 CONFIG_HOTPLUG_CPU=y # CONFIG_COMPAT_VDSO is not set +# CONFIG_CMDLINE_BOOL is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y # -# Power management options +# Power management and ACPI options # CONFIG_ARCH_HIBERNATION_HEADER=y CONFIG_PM=y @@ -333,20 +362,14 @@ CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y -# CONFIG_ACPI_BAY is not set CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_HOTPLUG_CPU=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_NUMA=y -# CONFIG_ACPI_WMI is not set -# CONFIG_ACPI_ASUS is not set -# CONFIG_ACPI_TOSHIBA is not set # CONFIG_ACPI_CUSTOM_DSDT is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set -CONFIG_ACPI_EC=y # CONFIG_ACPI_PCI_SLOT is not set -CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_CONTAINER=y @@ -381,13 +404,17 @@ CONFIG_X86_ACPI_CPUFREQ=y # # shared options # -# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set # CONFIG_X86_SPEEDSTEP_LIB is not set CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y # +# Memory power savings +# +# CONFIG_I7300_IDLE is not set + +# # Bus options (PCI etc.) # CONFIG_PCI=y @@ -395,8 +422,10 @@ CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_DMAR=y +# CONFIG_DMAR_DEFAULT_ON is not set CONFIG_DMAR_GFX_WA=y CONFIG_DMAR_FLOPPY_WA=y +# CONFIG_INTR_REMAP is not set CONFIG_PCIEPORTBUS=y # CONFIG_HOTPLUG_PCI_PCIE is not set CONFIG_PCIEAER=y @@ -405,6 +434,7 @@ CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y # CONFIG_PCI_LEGACY is not set # CONFIG_PCI_DEBUG is not set +# CONFIG_PCI_STUB is not set CONFIG_HT_IRQ=y CONFIG_ISA_DMA_API=y CONFIG_K8_NB=y @@ -438,6 +468,8 @@ CONFIG_HOTPLUG_PCI=y # CONFIG_BINFMT_ELF=y CONFIG_COMPAT_BINFMT_ELF=y +CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y +# CONFIG_HAVE_AOUT is not set CONFIG_BINFMT_MISC=y CONFIG_IA32_EMULATION=y # CONFIG_IA32_AOUT is not set @@ -449,6 +481,7 @@ CONFIG_NET=y # # Networking options # +CONFIG_COMPAT_NET_DEV_OPS=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y @@ -509,7 +542,6 @@ CONFIG_DEFAULT_CUBIC=y # CONFIG_DEFAULT_RENO is not set CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_TCP_MD5SIG=y -# CONFIG_IP_VS is not set CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set @@ -547,19 +579,21 @@ CONFIG_NF_CONNTRACK_IRC=y CONFIG_NF_CONNTRACK_SIP=y CONFIG_NF_CT_NETLINK=y CONFIG_NETFILTER_XTABLES=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y CONFIG_NETFILTER_XT_TARGET_MARK=y CONFIG_NETFILTER_XT_TARGET_NFLOG=y CONFIG_NETFILTER_XT_TARGET_SECMARK=y -CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y CONFIG_NETFILTER_XT_TARGET_TCPMSS=y CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y CONFIG_NETFILTER_XT_MATCH_MARK=y CONFIG_NETFILTER_XT_MATCH_POLICY=y CONFIG_NETFILTER_XT_MATCH_STATE=y +# CONFIG_IP_VS is not set # # IP: Netfilter Configuration # +CONFIG_NF_DEFRAG_IPV4=y CONFIG_NF_CONNTRACK_IPV4=y CONFIG_NF_CONNTRACK_PROC_COMPAT=y CONFIG_IP_NF_IPTABLES=y @@ -585,8 +619,8 @@ CONFIG_IP_NF_MANGLE=y CONFIG_NF_CONNTRACK_IPV6=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_MATCH_IPV6HEADER=y -CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_LOG=y +CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_REJECT=y CONFIG_IP6_NF_MANGLE=y # CONFIG_IP_DCCP is not set @@ -594,6 +628,7 @@ CONFIG_IP6_NF_MANGLE=y # CONFIG_TIPC is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set +# CONFIG_NET_DSA is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=y @@ -613,6 +648,7 @@ CONFIG_NET_SCHED=y # CONFIG_NET_SCH_HTB is not set # CONFIG_NET_SCH_HFSC is not set # CONFIG_NET_SCH_PRIO is not set +# CONFIG_NET_SCH_MULTIQ is not set # CONFIG_NET_SCH_RED is not set # CONFIG_NET_SCH_SFQ is not set # CONFIG_NET_SCH_TEQL is not set @@ -620,6 +656,7 @@ CONFIG_NET_SCHED=y # CONFIG_NET_SCH_GRED is not set # CONFIG_NET_SCH_DSMARK is not set # CONFIG_NET_SCH_NETEM is not set +# CONFIG_NET_SCH_DRR is not set # CONFIG_NET_SCH_INGRESS is not set # @@ -634,6 +671,7 @@ CONFIG_NET_CLS=y # CONFIG_NET_CLS_RSVP is not set # CONFIG_NET_CLS_RSVP6 is not set # CONFIG_NET_CLS_FLOW is not set +# CONFIG_NET_CLS_CGROUP is not set CONFIG_NET_EMATCH=y CONFIG_NET_EMATCH_STACK=32 # CONFIG_NET_EMATCH_CMP is not set @@ -649,7 +687,9 @@ CONFIG_NET_CLS_ACT=y # CONFIG_NET_ACT_NAT is not set # CONFIG_NET_ACT_PEDIT is not set # CONFIG_NET_ACT_SIMP is not set +# CONFIG_NET_ACT_SKBEDIT is not set CONFIG_NET_SCH_FIFO=y +# CONFIG_DCB is not set # # Network testing @@ -666,29 +706,33 @@ CONFIG_HAMRADIO=y # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_AF_RXRPC is not set +# CONFIG_PHONET is not set CONFIG_FIB_RULES=y - -# -# Wireless -# +CONFIG_WIRELESS=y CONFIG_CFG80211=y +# CONFIG_CFG80211_REG_DEBUG is not set CONFIG_NL80211=y +CONFIG_WIRELESS_OLD_REGULATORY=y CONFIG_WIRELESS_EXT=y CONFIG_WIRELESS_EXT_SYSFS=y +# CONFIG_LIB80211 is not set CONFIG_MAC80211=y # # Rate control algorithm selection # -CONFIG_MAC80211_RC_PID=y -CONFIG_MAC80211_RC_DEFAULT_PID=y -CONFIG_MAC80211_RC_DEFAULT="pid" +CONFIG_MAC80211_RC_MINSTREL=y +# CONFIG_MAC80211_RC_DEFAULT_PID is not set +CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y +CONFIG_MAC80211_RC_DEFAULT="minstrel" # CONFIG_MAC80211_MESH is not set CONFIG_MAC80211_LEDS=y # CONFIG_MAC80211_DEBUGFS is not set # CONFIG_MAC80211_DEBUG_MENU is not set -# CONFIG_IEEE80211 is not set -# CONFIG_RFKILL is not set +# CONFIG_WIMAX is not set +CONFIG_RFKILL=y +# CONFIG_RFKILL_INPUT is not set +CONFIG_RFKILL_LEDS=y # CONFIG_NET_9P is not set # @@ -712,7 +756,7 @@ CONFIG_PROC_EVENTS=y # CONFIG_MTD is not set # CONFIG_PARPORT is not set CONFIG_PNP=y -# CONFIG_PNP_DEBUG is not set +CONFIG_PNP_DEBUG_MESSAGES=y # # Protocols @@ -740,21 +784,21 @@ CONFIG_BLK_DEV_RAM_SIZE=16384 CONFIG_MISC_DEVICES=y # CONFIG_IBM_ASM is not set # CONFIG_PHANTOM is not set -# CONFIG_EEPROM_93CX6 is not set # CONFIG_SGI_IOC4 is not set # CONFIG_TIFM_CORE is not set -# CONFIG_ACER_WMI is not set -# CONFIG_ASUS_LAPTOP is not set -# CONFIG_FUJITSU_LAPTOP is not set -# CONFIG_MSI_LAPTOP is not set -# CONFIG_COMPAL_LAPTOP is not set -# CONFIG_SONY_LAPTOP is not set -# CONFIG_THINKPAD_ACPI is not set -# CONFIG_INTEL_MENLOW is not set +# CONFIG_ICS932S401 is not set # CONFIG_ENCLOSURE_SERVICES is not set # CONFIG_SGI_XP is not set # CONFIG_HP_ILO is not set # CONFIG_SGI_GRU is not set +# CONFIG_C2PORT is not set + +# +# EEPROM support +# +# CONFIG_EEPROM_AT24 is not set +# CONFIG_EEPROM_LEGACY is not set +# CONFIG_EEPROM_93CX6 is not set CONFIG_HAVE_IDE=y # CONFIG_IDE is not set @@ -793,7 +837,7 @@ CONFIG_SCSI_WAIT_SCAN=m # CONFIG_SCSI_SPI_ATTRS=y # CONFIG_SCSI_FC_ATTRS is not set -CONFIG_SCSI_ISCSI_ATTRS=y +# CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_ATTRS is not set # CONFIG_SCSI_SAS_LIBSAS is not set # CONFIG_SCSI_SRP_ATTRS is not set @@ -864,6 +908,7 @@ CONFIG_PATA_OLDPIIX=y CONFIG_PATA_SCH=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y +CONFIG_MD_AUTODETECT=y # CONFIG_MD_LINEAR is not set # CONFIG_MD_RAID0 is not set # CONFIG_MD_RAID1 is not set @@ -919,6 +964,9 @@ CONFIG_PHYLIB=y # CONFIG_BROADCOM_PHY is not set # CONFIG_ICPLUS_PHY is not set # CONFIG_REALTEK_PHY is not set +# CONFIG_NATIONAL_PHY is not set +# CONFIG_STE10XP is not set +# CONFIG_LSI_ET1011C_PHY is not set # CONFIG_FIXED_PHY is not set # CONFIG_MDIO_BITBANG is not set CONFIG_NET_ETHERNET=y @@ -942,6 +990,9 @@ CONFIG_NET_TULIP=y # CONFIG_IBM_NEW_EMAC_RGMII is not set # CONFIG_IBM_NEW_EMAC_TAH is not set # CONFIG_IBM_NEW_EMAC_EMAC4 is not set +# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set +# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set +# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set @@ -949,7 +1000,6 @@ CONFIG_NET_PCI=y # CONFIG_B44 is not set CONFIG_FORCEDETH=y # CONFIG_FORCEDETH_NAPI is not set -# CONFIG_EEPRO100 is not set CONFIG_E100=y # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set @@ -963,15 +1013,16 @@ CONFIG_8139TOO_PIO=y # CONFIG_R6040 is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set +# CONFIG_SMSC9420 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set # CONFIG_VIA_RHINE is not set # CONFIG_SC92031 is not set +# CONFIG_ATL2 is not set CONFIG_NETDEV_1000=y # CONFIG_ACENIC is not set # CONFIG_DL2K is not set CONFIG_E1000=y -# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set # CONFIG_E1000E is not set # CONFIG_IP1000 is not set # CONFIG_IGB is not set @@ -989,18 +1040,23 @@ CONFIG_TIGON3=y # CONFIG_QLA3XXX is not set # CONFIG_ATL1 is not set # CONFIG_ATL1E is not set +# CONFIG_JME is not set CONFIG_NETDEV_10000=y # CONFIG_CHELSIO_T1 is not set +CONFIG_CHELSIO_T3_DEPENDS=y # CONFIG_CHELSIO_T3 is not set +# CONFIG_ENIC is not set # CONFIG_IXGBE is not set # CONFIG_IXGB is not set # CONFIG_S2IO is not set # CONFIG_MYRI10GE is not set # CONFIG_NETXEN_NIC is not set # CONFIG_NIU is not set +# CONFIG_MLX4_EN is not set # CONFIG_MLX4_CORE is not set # CONFIG_TEHUTI is not set # CONFIG_BNX2X is not set +# CONFIG_QLGE is not set # CONFIG_SFC is not set CONFIG_TR=y # CONFIG_IBMOL is not set @@ -1013,9 +1069,8 @@ CONFIG_TR=y # CONFIG_WLAN_PRE80211 is not set CONFIG_WLAN_80211=y # CONFIG_PCMCIA_RAYCS is not set -# CONFIG_IPW2100 is not set -# CONFIG_IPW2200 is not set # CONFIG_LIBERTAS is not set +# CONFIG_LIBERTAS_THINFIRM is not set # CONFIG_AIRO is not set # CONFIG_HERMES is not set # CONFIG_ATMEL is not set @@ -1032,6 +1087,8 @@ CONFIG_WLAN_80211=y CONFIG_ATH5K=y # CONFIG_ATH5K_DEBUG is not set # CONFIG_ATH9K is not set +# CONFIG_IPW2100 is not set +# CONFIG_IPW2200 is not set # CONFIG_IWLCORE is not set # CONFIG_IWLWIFI_LEDS is not set # CONFIG_IWLAGN is not set @@ -1043,6 +1100,10 @@ CONFIG_ATH5K=y # CONFIG_RT2X00 is not set # +# Enable WiMAX (Networking options) to see the WiMAX drivers +# + +# # USB Network Adapters # # CONFIG_USB_CATC is not set @@ -1050,6 +1111,7 @@ CONFIG_ATH5K=y # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set +# CONFIG_USB_HSO is not set CONFIG_NET_PCMCIA=y # CONFIG_PCMCIA_3C589 is not set # CONFIG_PCMCIA_3C574 is not set @@ -1059,6 +1121,7 @@ CONFIG_NET_PCMCIA=y # CONFIG_PCMCIA_SMC91C92 is not set # CONFIG_PCMCIA_XIRC2PS is not set # CONFIG_PCMCIA_AXNET is not set +# CONFIG_PCMCIA_IBMTR is not set # CONFIG_WAN is not set CONFIG_FDDI=y # CONFIG_DEFXX is not set @@ -1110,6 +1173,7 @@ CONFIG_MOUSE_PS2_LOGIPS2PP=y CONFIG_MOUSE_PS2_SYNAPTICS=y CONFIG_MOUSE_PS2_LIFEBOOK=y CONFIG_MOUSE_PS2_TRACKPOINT=y +# CONFIG_MOUSE_PS2_ELANTECH is not set # CONFIG_MOUSE_PS2_TOUCHKIT is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_APPLETOUCH is not set @@ -1147,15 +1211,16 @@ CONFIG_INPUT_TOUCHSCREEN=y # CONFIG_TOUCHSCREEN_FUJITSU is not set # CONFIG_TOUCHSCREEN_GUNZE is not set # CONFIG_TOUCHSCREEN_ELO is not set +# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set # CONFIG_TOUCHSCREEN_MTOUCH is not set # CONFIG_TOUCHSCREEN_INEXIO is not set # CONFIG_TOUCHSCREEN_MK712 is not set # CONFIG_TOUCHSCREEN_PENMOUNT is not set # CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set # CONFIG_TOUCHSCREEN_TOUCHWIN is not set -# CONFIG_TOUCHSCREEN_UCB1400 is not set # CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set # CONFIG_TOUCHSCREEN_TOUCHIT213 is not set +# CONFIG_TOUCHSCREEN_TSC2007 is not set CONFIG_INPUT_MISC=y # CONFIG_INPUT_PCSPKR is not set # CONFIG_INPUT_APANEL is not set @@ -1165,6 +1230,7 @@ CONFIG_INPUT_MISC=y # CONFIG_INPUT_KEYSPAN_REMOTE is not set # CONFIG_INPUT_POWERMATE is not set # CONFIG_INPUT_YEALINK is not set +# CONFIG_INPUT_CM109 is not set # CONFIG_INPUT_UINPUT is not set # @@ -1231,6 +1297,7 @@ CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y +# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set # CONFIG_LEGACY_PTYS is not set # CONFIG_IPMI_HANDLER is not set CONFIG_HW_RANDOM=y @@ -1260,6 +1327,7 @@ CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y # CONFIG_I2C_CHARDEV is not set CONFIG_I2C_HELPER_AUTO=y +CONFIG_I2C_ALGOBIT=y # # I2C Hardware Bus support @@ -1311,8 +1379,6 @@ CONFIG_I2C_I801=y # Miscellaneous I2C Chip support # # CONFIG_DS1682 is not set -# CONFIG_EEPROM_AT24 is not set -# CONFIG_EEPROM_LEGACY is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_PCF8575 is not set # CONFIG_SENSORS_PCA9539 is not set @@ -1331,8 +1397,78 @@ CONFIG_POWER_SUPPLY=y # CONFIG_POWER_SUPPLY_DEBUG is not set # CONFIG_PDA_POWER is not set # CONFIG_BATTERY_DS2760 is not set -# CONFIG_HWMON is not set +# CONFIG_BATTERY_BQ27x00 is not set +CONFIG_HWMON=y +# CONFIG_HWMON_VID is not set +# CONFIG_SENSORS_ABITUGURU is not set +# CONFIG_SENSORS_ABITUGURU3 is not set +# CONFIG_SENSORS_AD7414 is not set +# CONFIG_SENSORS_AD7418 is not set +# CONFIG_SENSORS_ADM1021 is not set +# CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set +# CONFIG_SENSORS_ADM1029 is not set +# CONFIG_SENSORS_ADM1031 is not set +# CONFIG_SENSORS_ADM9240 is not set +# CONFIG_SENSORS_ADT7462 is not set +# CONFIG_SENSORS_ADT7470 is not set +# CONFIG_SENSORS_ADT7473 is not set +# CONFIG_SENSORS_ADT7475 is not set +# CONFIG_SENSORS_K8TEMP is not set +# CONFIG_SENSORS_ASB100 is not set +# CONFIG_SENSORS_ATXP1 is not set +# CONFIG_SENSORS_DS1621 is not set +# CONFIG_SENSORS_I5K_AMB is not set +# CONFIG_SENSORS_F71805F is not set +# CONFIG_SENSORS_F71882FG is not set +# CONFIG_SENSORS_F75375S is not set +# CONFIG_SENSORS_FSCHER is not set +# CONFIG_SENSORS_FSCPOS is not set +# CONFIG_SENSORS_FSCHMD is not set +# CONFIG_SENSORS_GL518SM is not set +# CONFIG_SENSORS_GL520SM is not set +# CONFIG_SENSORS_CORETEMP is not set +# CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set +# CONFIG_SENSORS_LM75 is not set +# CONFIG_SENSORS_LM77 is not set +# CONFIG_SENSORS_LM78 is not set +# CONFIG_SENSORS_LM80 is not set +# CONFIG_SENSORS_LM83 is not set +# CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set +# CONFIG_SENSORS_LM90 is not set +# CONFIG_SENSORS_LM92 is not set +# CONFIG_SENSORS_LM93 is not set +# CONFIG_SENSORS_LTC4245 is not set +# CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_MAX6650 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_PC87427 is not set +# CONFIG_SENSORS_SIS5595 is not set +# CONFIG_SENSORS_DME1737 is not set +# CONFIG_SENSORS_SMSC47M1 is not set +# CONFIG_SENSORS_SMSC47M192 is not set +# CONFIG_SENSORS_SMSC47B397 is not set +# CONFIG_SENSORS_ADS7828 is not set +# CONFIG_SENSORS_THMC50 is not set +# CONFIG_SENSORS_VIA686A is not set +# CONFIG_SENSORS_VT1211 is not set +# CONFIG_SENSORS_VT8231 is not set +# CONFIG_SENSORS_W83781D is not set +# CONFIG_SENSORS_W83791D is not set +# CONFIG_SENSORS_W83792D is not set +# CONFIG_SENSORS_W83793 is not set +# CONFIG_SENSORS_W83L785TS is not set +# CONFIG_SENSORS_W83L786NG is not set +# CONFIG_SENSORS_W83627HF is not set +# CONFIG_SENSORS_W83627EHF is not set +# CONFIG_SENSORS_HDAPS is not set +# CONFIG_SENSORS_LIS3LV02D is not set +# CONFIG_SENSORS_APPLESMC is not set +# CONFIG_HWMON_DEBUG_CHIP is not set CONFIG_THERMAL=y +# CONFIG_THERMAL_HWMON is not set CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set @@ -1352,15 +1488,18 @@ CONFIG_WATCHDOG=y # CONFIG_I6300ESB_WDT is not set # CONFIG_ITCO_WDT is not set # CONFIG_IT8712F_WDT is not set +# CONFIG_IT87_WDT is not set # CONFIG_HP_WATCHDOG is not set # CONFIG_SC1200_WDT is not set # CONFIG_PC87413_WDT is not set # CONFIG_60XX_WDT is not set # CONFIG_SBC8360_WDT is not set # CONFIG_CPU5_WDT is not set +# CONFIG_SMSC_SCH311X_WDT is not set # CONFIG_SMSC37B787_WDT is not set # CONFIG_W83627HF_WDT is not set # CONFIG_W83697HF_WDT is not set +# CONFIG_W83697UG_WDT is not set # CONFIG_W83877F_WDT is not set # CONFIG_W83977F_WDT is not set # CONFIG_MACHZ_WDT is not set @@ -1376,11 +1515,11 @@ CONFIG_WATCHDOG=y # USB-based Watchdog Cards # # CONFIG_USBPCWATCHDOG is not set +CONFIG_SSB_POSSIBLE=y # # Sonics Silicon Backplane # -CONFIG_SSB_POSSIBLE=y # CONFIG_SSB is not set # @@ -1389,7 +1528,13 @@ CONFIG_SSB_POSSIBLE=y # CONFIG_MFD_CORE is not set # CONFIG_MFD_SM501 is not set # CONFIG_HTC_PASIC3 is not set +# CONFIG_TWL4030_CORE is not set # CONFIG_MFD_TMIO is not set +# CONFIG_PMIC_DA903X is not set +# CONFIG_MFD_WM8400 is not set +# CONFIG_MFD_WM8350_I2C is not set +# CONFIG_MFD_PCF50633 is not set +# CONFIG_REGULATOR is not set # # Multimedia devices @@ -1423,6 +1568,7 @@ CONFIG_DRM=y # CONFIG_DRM_I810 is not set # CONFIG_DRM_I830 is not set CONFIG_DRM_I915=y +CONFIG_DRM_I915_KMS=y # CONFIG_DRM_MGA is not set # CONFIG_DRM_SIS is not set # CONFIG_DRM_VIA is not set @@ -1432,6 +1578,7 @@ CONFIG_DRM_I915=y CONFIG_FB=y # CONFIG_FIRMWARE_EDID is not set # CONFIG_FB_DDC is not set +# CONFIG_FB_BOOT_VESA_SUPPORT is not set CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y @@ -1460,7 +1607,6 @@ CONFIG_FB_TILEBLITTING=y # CONFIG_FB_UVESA is not set # CONFIG_FB_VESA is not set CONFIG_FB_EFI=y -# CONFIG_FB_IMAC is not set # CONFIG_FB_N411 is not set # CONFIG_FB_HGA is not set # CONFIG_FB_S1D13XXX is not set @@ -1475,6 +1621,7 @@ CONFIG_FB_EFI=y # CONFIG_FB_S3 is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set +# CONFIG_FB_VIA is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set @@ -1486,12 +1633,15 @@ CONFIG_FB_EFI=y # CONFIG_FB_CARMINE is not set # CONFIG_FB_GEODE is not set # CONFIG_FB_VIRTUAL is not set +# CONFIG_FB_METRONOME is not set +# CONFIG_FB_MB862XX is not set CONFIG_BACKLIGHT_LCD_SUPPORT=y # CONFIG_LCD_CLASS_DEVICE is not set CONFIG_BACKLIGHT_CLASS_DEVICE=y -# CONFIG_BACKLIGHT_CORGI is not set +CONFIG_BACKLIGHT_GENERIC=y # CONFIG_BACKLIGHT_PROGEAR is not set # CONFIG_BACKLIGHT_MBP_NVIDIA is not set +# CONFIG_BACKLIGHT_SAHARA is not set # # Display device support @@ -1511,10 +1661,12 @@ CONFIG_LOGO=y # CONFIG_LOGO_LINUX_VGA16 is not set CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=y +CONFIG_SOUND_OSS_CORE=y CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y +CONFIG_SND_JACK=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y @@ -1522,6 +1674,8 @@ CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y CONFIG_SND_PCM_OSS_PLUGINS=y CONFIG_SND_SEQUENCER_OSS=y +CONFIG_SND_HRTIMER=y +CONFIG_SND_SEQ_HRTIMER_DEFAULT=y CONFIG_SND_DYNAMIC_MINORS=y CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_VERBOSE_PROCFS=y @@ -1575,11 +1729,16 @@ CONFIG_SND_PCI=y # CONFIG_SND_FM801 is not set CONFIG_SND_HDA_INTEL=y CONFIG_SND_HDA_HWDEP=y +# CONFIG_SND_HDA_RECONFIG is not set +# CONFIG_SND_HDA_INPUT_BEEP is not set CONFIG_SND_HDA_CODEC_REALTEK=y CONFIG_SND_HDA_CODEC_ANALOG=y CONFIG_SND_HDA_CODEC_SIGMATEL=y CONFIG_SND_HDA_CODEC_VIA=y CONFIG_SND_HDA_CODEC_ATIHDMI=y +CONFIG_SND_HDA_CODEC_NVHDMI=y +CONFIG_SND_HDA_CODEC_INTELHDMI=y +CONFIG_SND_HDA_ELD=y CONFIG_SND_HDA_CODEC_CONEXANT=y CONFIG_SND_HDA_CODEC_CMEDIA=y CONFIG_SND_HDA_CODEC_SI3054=y @@ -1612,6 +1771,7 @@ CONFIG_SND_USB=y # CONFIG_SND_USB_AUDIO is not set # CONFIG_SND_USB_USX2Y is not set # CONFIG_SND_USB_CAIAQ is not set +# CONFIG_SND_USB_US122L is not set CONFIG_SND_PCMCIA=y # CONFIG_SND_VXPOCKET is not set # CONFIG_SND_PDAUDIOCF is not set @@ -1626,15 +1786,37 @@ CONFIG_HIDRAW=y # USB Input Devices # CONFIG_USB_HID=y -CONFIG_USB_HIDINPUT_POWERBOOK=y -CONFIG_HID_FF=y CONFIG_HID_PID=y +CONFIG_USB_HIDDEV=y + +# +# Special HID drivers +# +CONFIG_HID_COMPAT=y +CONFIG_HID_A4TECH=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_EZKEY=y +CONFIG_HID_GYRATION=y +CONFIG_HID_LOGITECH=y CONFIG_LOGITECH_FF=y # CONFIG_LOGIRUMBLEPAD2_FF is not set +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_NTRIG=y +CONFIG_HID_PANTHERLORD=y CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SUNPLUS=y +# CONFIG_GREENASIA_FF is not set +CONFIG_HID_TOPSEED=y CONFIG_THRUSTMASTER_FF=y CONFIG_ZEROPLUS_FF=y -CONFIG_USB_HIDDEV=y CONFIG_USB_SUPPORT=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y @@ -1652,6 +1834,8 @@ CONFIG_USB_DEVICEFS=y CONFIG_USB_SUSPEND=y # CONFIG_USB_OTG is not set CONFIG_USB_MON=y +# CONFIG_USB_WUSB is not set +# CONFIG_USB_WUSB_CBAF is not set # # USB Host Controller Drivers @@ -1660,6 +1844,7 @@ CONFIG_USB_MON=y CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_EHCI_TT_NEWSCHED is not set +# CONFIG_USB_OXU210HP_HCD is not set # CONFIG_USB_ISP116X_HCD is not set # CONFIG_USB_ISP1760_HCD is not set CONFIG_USB_OHCI_HCD=y @@ -1669,6 +1854,8 @@ CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_UHCI_HCD=y # CONFIG_USB_SL811_HCD is not set # CONFIG_USB_R8A66597_HCD is not set +# CONFIG_USB_WHCI_HCD is not set +# CONFIG_USB_HWA_HCD is not set # # USB Device Class drivers @@ -1676,20 +1863,20 @@ CONFIG_USB_UHCI_HCD=y # CONFIG_USB_ACM is not set CONFIG_USB_PRINTER=y # CONFIG_USB_WDM is not set +# CONFIG_USB_TMC is not set # -# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed; # # -# may also be needed; see USB_STORAGE Help for more information +# see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set -# CONFIG_USB_STORAGE_DPCM is not set # CONFIG_USB_STORAGE_USBAT is not set # CONFIG_USB_STORAGE_SDDR09 is not set # CONFIG_USB_STORAGE_SDDR55 is not set @@ -1697,7 +1884,6 @@ CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_ALAUDA is not set # CONFIG_USB_STORAGE_ONETOUCH is not set # CONFIG_USB_STORAGE_KARMA is not set -# CONFIG_USB_STORAGE_SIERRA is not set # CONFIG_USB_STORAGE_CYPRESS_ATACB is not set CONFIG_USB_LIBUSUAL=y @@ -1718,6 +1904,7 @@ CONFIG_USB_LIBUSUAL=y # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_ADUTUX is not set +# CONFIG_USB_SEVSEG is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set @@ -1735,7 +1922,13 @@ CONFIG_USB_LIBUSUAL=y # CONFIG_USB_IOWARRIOR is not set # CONFIG_USB_TEST is not set # CONFIG_USB_ISIGHTFW is not set +# CONFIG_USB_VST is not set # CONFIG_USB_GADGET is not set + +# +# OTG and related infrastructure +# +# CONFIG_UWB is not set # CONFIG_MMC is not set # CONFIG_MEMSTICK is not set CONFIG_NEW_LEDS=y @@ -1744,6 +1937,7 @@ CONFIG_LEDS_CLASS=y # # LED drivers # +# CONFIG_LEDS_ALIX2 is not set # CONFIG_LEDS_PCA9532 is not set # CONFIG_LEDS_CLEVO_MAIL is not set # CONFIG_LEDS_PCA955X is not set @@ -1754,6 +1948,7 @@ CONFIG_LEDS_CLASS=y CONFIG_LEDS_TRIGGERS=y # CONFIG_LEDS_TRIGGER_TIMER is not set # CONFIG_LEDS_TRIGGER_HEARTBEAT is not set +# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set # CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set # CONFIG_ACCESSIBILITY is not set # CONFIG_INFINIBAND is not set @@ -1793,6 +1988,7 @@ CONFIG_RTC_INTF_DEV=y # CONFIG_RTC_DRV_M41T80 is not set # CONFIG_RTC_DRV_S35390A is not set # CONFIG_RTC_DRV_FM3130 is not set +# CONFIG_RTC_DRV_RX8581 is not set # # SPI RTC drivers @@ -1802,12 +1998,15 @@ CONFIG_RTC_INTF_DEV=y # Platform RTC drivers # CONFIG_RTC_DRV_CMOS=y +# CONFIG_RTC_DRV_DS1286 is not set # CONFIG_RTC_DRV_DS1511 is not set # CONFIG_RTC_DRV_DS1553 is not set # CONFIG_RTC_DRV_DS1742 is not set # CONFIG_RTC_DRV_STK17TA8 is not set # CONFIG_RTC_DRV_M48T86 is not set +# CONFIG_RTC_DRV_M48T35 is not set # CONFIG_RTC_DRV_M48T59 is not set +# CONFIG_RTC_DRV_BQ4802 is not set # CONFIG_RTC_DRV_V3020 is not set # @@ -1820,6 +2019,21 @@ CONFIG_DMADEVICES=y # # CONFIG_INTEL_IOATDMA is not set # CONFIG_UIO is not set +# CONFIG_STAGING is not set +CONFIG_X86_PLATFORM_DEVICES=y +# CONFIG_ACER_WMI is not set +# CONFIG_ASUS_LAPTOP is not set +# CONFIG_FUJITSU_LAPTOP is not set +# CONFIG_MSI_LAPTOP is not set +# CONFIG_PANASONIC_LAPTOP is not set +# CONFIG_COMPAL_LAPTOP is not set +# CONFIG_SONY_LAPTOP is not set +# CONFIG_THINKPAD_ACPI is not set +# CONFIG_INTEL_MENLOW is not set +CONFIG_EEEPC_LAPTOP=y +# CONFIG_ACPI_WMI is not set +# CONFIG_ACPI_ASUS is not set +# CONFIG_ACPI_TOSHIBA is not set # # Firmware Drivers @@ -1830,8 +2044,7 @@ CONFIG_EFI_VARS=y # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_DMIID=y -CONFIG_ISCSI_IBFT_FIND=y -CONFIG_ISCSI_IBFT=y +# CONFIG_ISCSI_IBFT_FIND is not set # # File systems @@ -1841,22 +2054,25 @@ CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y -# CONFIG_EXT4DEV_FS is not set +# CONFIG_EXT4_FS is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y +CONFIG_FILE_LOCKING=y # CONFIG_XFS_FS is not set # CONFIG_GFS2_FS is not set # CONFIG_OCFS2_FS is not set +# CONFIG_BTRFS_FS is not set CONFIG_DNOTIFY=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_QUOTA=y CONFIG_QUOTA_NETLINK_INTERFACE=y # CONFIG_PRINT_QUOTA_WARNING is not set +CONFIG_QUOTA_TREE=y # CONFIG_QFMT_V1 is not set CONFIG_QFMT_V2=y CONFIG_QUOTACTL=y @@ -1890,16 +2106,14 @@ CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_VMCORE=y CONFIG_PROC_SYSCTL=y +CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y # CONFIG_CONFIGFS_FS is not set - -# -# Miscellaneous filesystems -# +CONFIG_MISC_FILESYSTEMS=y # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_ECRYPT_FS is not set @@ -1909,6 +2123,7 @@ CONFIG_HUGETLB_PAGE=y # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set +# CONFIG_SQUASHFS is not set # CONFIG_VXFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_OMFS_FS is not set @@ -1930,6 +2145,7 @@ CONFIG_NFS_ACL_SUPPORT=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y +# CONFIG_SUNRPC_REGISTER_V4 is not set CONFIG_RPCSEC_GSS_KRB5=y # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set @@ -2006,7 +2222,7 @@ CONFIG_NLS_UTF8=y # CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y -CONFIG_ENABLE_WARN_DEPRECATED=y +# CONFIG_ENABLE_WARN_DEPRECATED is not set CONFIG_ENABLE_MUST_CHECK=y CONFIG_FRAME_WARN=2048 CONFIG_MAGIC_SYSRQ=y @@ -2035,40 +2251,60 @@ CONFIG_TIMER_STATS=y CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_VM is not set +# CONFIG_DEBUG_VIRTUAL is not set # CONFIG_DEBUG_WRITECOUNT is not set CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_LIST is not set # CONFIG_DEBUG_SG is not set +# CONFIG_DEBUG_NOTIFIERS is not set +CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_FRAME_POINTER=y # CONFIG_BOOT_PRINTK_DELAY is not set # CONFIG_RCU_TORTURE_TEST is not set +# CONFIG_RCU_CPU_STALL_DETECTOR is not set # CONFIG_KPROBES_SANITY_TEST is not set # CONFIG_BACKTRACE_SELF_TEST is not set +# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set # CONFIG_LKDTM is not set # CONFIG_FAULT_INJECTION is not set # CONFIG_LATENCYTOP is not set CONFIG_SYSCTL_SYSCALL_CHECK=y -CONFIG_HAVE_FTRACE=y +CONFIG_USER_STACKTRACE_SUPPORT=y +CONFIG_HAVE_FUNCTION_TRACER=y +CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y +CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y -# CONFIG_FTRACE is not set +CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y +CONFIG_HAVE_HW_BRANCH_TRACER=y + +# +# Tracers +# +# CONFIG_FUNCTION_TRACER is not set # CONFIG_IRQSOFF_TRACER is not set # CONFIG_SYSPROF_TRACER is not set # CONFIG_SCHED_TRACER is not set # CONFIG_CONTEXT_SWITCH_TRACER is not set +# CONFIG_BOOT_TRACER is not set +# CONFIG_TRACE_BRANCH_PROFILING is not set +# CONFIG_POWER_TRACER is not set +# CONFIG_STACK_TRACER is not set +# CONFIG_HW_BRANCH_TRACER is not set CONFIG_PROVIDE_OHCI1394_DMA_INIT=y +# CONFIG_DYNAMIC_PRINTK_DEBUG is not set # CONFIG_SAMPLES is not set CONFIG_HAVE_ARCH_KGDB=y # CONFIG_KGDB is not set # CONFIG_STRICT_DEVMEM is not set CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y +CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_DEBUG_PER_CPU_MAPS is not set # CONFIG_X86_PTDUMP is not set CONFIG_DEBUG_RODATA=y -# CONFIG_DIRECT_GBPAGES is not set # CONFIG_DEBUG_RODATA_TEST is not set CONFIG_DEBUG_NX_TEST=m # CONFIG_IOMMU_DEBUG is not set @@ -2092,8 +2328,10 @@ CONFIG_OPTIMIZE_INLINING=y CONFIG_KEYS=y CONFIG_KEYS_DEBUG_PROC_KEYS=y CONFIG_SECURITY=y +# CONFIG_SECURITYFS is not set CONFIG_SECURITY_NETWORK=y # CONFIG_SECURITY_NETWORK_XFRM is not set +# CONFIG_SECURITY_PATH is not set CONFIG_SECURITY_FILE_CAPABILITIES=y # CONFIG_SECURITY_ROOTPLUG is not set CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=65536 @@ -2104,7 +2342,6 @@ CONFIG_SECURITY_SELINUX_DISABLE=y CONFIG_SECURITY_SELINUX_DEVELOP=y CONFIG_SECURITY_SELINUX_AVC_STATS=y CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1 -# CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT is not set # CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set # CONFIG_SECURITY_SMACK is not set CONFIG_CRYPTO=y @@ -2112,11 +2349,18 @@ CONFIG_CRYPTO=y # # Crypto core or helper # +# CONFIG_CRYPTO_FIPS is not set CONFIG_CRYPTO_ALGAPI=y +CONFIG_CRYPTO_ALGAPI2=y CONFIG_CRYPTO_AEAD=y +CONFIG_CRYPTO_AEAD2=y CONFIG_CRYPTO_BLKCIPHER=y +CONFIG_CRYPTO_BLKCIPHER2=y CONFIG_CRYPTO_HASH=y +CONFIG_CRYPTO_HASH2=y +CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_MANAGER=y +CONFIG_CRYPTO_MANAGER2=y # CONFIG_CRYPTO_GF128MUL is not set # CONFIG_CRYPTO_NULL is not set # CONFIG_CRYPTO_CRYPTD is not set @@ -2151,6 +2395,7 @@ CONFIG_CRYPTO_HMAC=y # Digest # # CONFIG_CRYPTO_CRC32C is not set +# CONFIG_CRYPTO_CRC32C_INTEL is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=y # CONFIG_CRYPTO_MICHAEL_MIC is not set @@ -2191,6 +2436,11 @@ CONFIG_CRYPTO_DES=y # # CONFIG_CRYPTO_DEFLATE is not set # CONFIG_CRYPTO_LZO is not set + +# +# Random Number Generation +# +# CONFIG_CRYPTO_ANSI_CPRNG is not set CONFIG_CRYPTO_HW=y # CONFIG_CRYPTO_DEV_HIFN_795X is not set CONFIG_HAVE_KVM=y @@ -2205,6 +2455,7 @@ CONFIG_VIRTUALIZATION=y CONFIG_BITREVERSE=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_FIND_NEXT_BIT=y +CONFIG_GENERIC_FIND_LAST_BIT=y # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC_T10DIF=y Index: linux-2.6-tip/arch/x86/ia32/ia32_signal.c =================================================================== --- linux-2.6-tip.orig/arch/x86/ia32/ia32_signal.c +++ linux-2.6-tip/arch/x86/ia32/ia32_signal.c @@ -33,8 +33,6 @@ #include #include -#define DEBUG_SIG 0 - #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) #define FIX_EFLAGS (X86_EFLAGS_AC | X86_EFLAGS_OF | \ @@ -46,78 +44,83 @@ void signal_fault(struct pt_regs *regs, int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from) { - int err; + int err = 0; if (!access_ok(VERIFY_WRITE, to, sizeof(compat_siginfo_t))) return -EFAULT; - /* If you change siginfo_t structure, please make sure that - this code is fixed accordingly. - It should never copy any pad contained in the structure - to avoid security leaks, but must copy the generic - 3 ints plus the relevant union member. */ - err = __put_user(from->si_signo, &to->si_signo); - err |= __put_user(from->si_errno, &to->si_errno); - err |= __put_user((short)from->si_code, &to->si_code); - - if (from->si_code < 0) { - err |= __put_user(from->si_pid, &to->si_pid); - err |= __put_user(from->si_uid, &to->si_uid); - err |= __put_user(ptr_to_compat(from->si_ptr), &to->si_ptr); - } else { - /* - * First 32bits of unions are always present: - * si_pid === si_band === si_tid === si_addr(LS half) - */ - err |= __put_user(from->_sifields._pad[0], - &to->_sifields._pad[0]); - switch (from->si_code >> 16) { - case __SI_FAULT >> 16: - break; - case __SI_CHLD >> 16: - err |= __put_user(from->si_utime, &to->si_utime); - err |= __put_user(from->si_stime, &to->si_stime); - err |= __put_user(from->si_status, &to->si_status); - /* FALL THROUGH */ - default: - case __SI_KILL >> 16: - err |= __put_user(from->si_uid, &to->si_uid); - break; - case __SI_POLL >> 16: - err |= __put_user(from->si_fd, &to->si_fd); - break; - case __SI_TIMER >> 16: - err |= __put_user(from->si_overrun, &to->si_overrun); - err |= __put_user(ptr_to_compat(from->si_ptr), - &to->si_ptr); - break; - /* This is not generated by the kernel as of now. */ - case __SI_RT >> 16: - case __SI_MESGQ >> 16: - err |= __put_user(from->si_uid, &to->si_uid); - err |= __put_user(from->si_int, &to->si_int); - break; + put_user_try { + /* If you change siginfo_t structure, please make sure that + this code is fixed accordingly. + It should never copy any pad contained in the structure + to avoid security leaks, but must copy the generic + 3 ints plus the relevant union member. */ + put_user_ex(from->si_signo, &to->si_signo); + put_user_ex(from->si_errno, &to->si_errno); + put_user_ex((short)from->si_code, &to->si_code); + + if (from->si_code < 0) { + put_user_ex(from->si_pid, &to->si_pid); + put_user_ex(from->si_uid, &to->si_uid); + put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr); + } else { + /* + * First 32bits of unions are always present: + * si_pid === si_band === si_tid === si_addr(LS half) + */ + put_user_ex(from->_sifields._pad[0], + &to->_sifields._pad[0]); + switch (from->si_code >> 16) { + case __SI_FAULT >> 16: + break; + case __SI_CHLD >> 16: + put_user_ex(from->si_utime, &to->si_utime); + put_user_ex(from->si_stime, &to->si_stime); + put_user_ex(from->si_status, &to->si_status); + /* FALL THROUGH */ + default: + case __SI_KILL >> 16: + put_user_ex(from->si_uid, &to->si_uid); + break; + case __SI_POLL >> 16: + put_user_ex(from->si_fd, &to->si_fd); + break; + case __SI_TIMER >> 16: + put_user_ex(from->si_overrun, &to->si_overrun); + put_user_ex(ptr_to_compat(from->si_ptr), + &to->si_ptr); + break; + /* This is not generated by the kernel as of now. */ + case __SI_RT >> 16: + case __SI_MESGQ >> 16: + put_user_ex(from->si_uid, &to->si_uid); + put_user_ex(from->si_int, &to->si_int); + break; + } } - } + } put_user_catch(err); + return err; } int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from) { - int err; + int err = 0; u32 ptr32; if (!access_ok(VERIFY_READ, from, sizeof(compat_siginfo_t))) return -EFAULT; - err = __get_user(to->si_signo, &from->si_signo); - err |= __get_user(to->si_errno, &from->si_errno); - err |= __get_user(to->si_code, &from->si_code); - - err |= __get_user(to->si_pid, &from->si_pid); - err |= __get_user(to->si_uid, &from->si_uid); - err |= __get_user(ptr32, &from->si_ptr); - to->si_ptr = compat_ptr(ptr32); + get_user_try { + get_user_ex(to->si_signo, &from->si_signo); + get_user_ex(to->si_errno, &from->si_errno); + get_user_ex(to->si_code, &from->si_code); + + get_user_ex(to->si_pid, &from->si_pid); + get_user_ex(to->si_uid, &from->si_uid); + get_user_ex(ptr32, &from->si_ptr); + to->si_ptr = compat_ptr(ptr32); + } get_user_catch(err); return err; } @@ -142,17 +145,23 @@ asmlinkage long sys32_sigaltstack(const struct pt_regs *regs) { stack_t uss, uoss; - int ret; + int ret, err = 0; mm_segment_t seg; if (uss_ptr) { u32 ptr; memset(&uss, 0, sizeof(stack_t)); - if (!access_ok(VERIFY_READ, uss_ptr, sizeof(stack_ia32_t)) || - __get_user(ptr, &uss_ptr->ss_sp) || - __get_user(uss.ss_flags, &uss_ptr->ss_flags) || - __get_user(uss.ss_size, &uss_ptr->ss_size)) + if (!access_ok(VERIFY_READ, uss_ptr, sizeof(stack_ia32_t))) + return -EFAULT; + + get_user_try { + get_user_ex(ptr, &uss_ptr->ss_sp); + get_user_ex(uss.ss_flags, &uss_ptr->ss_flags); + get_user_ex(uss.ss_size, &uss_ptr->ss_size); + } get_user_catch(err); + + if (err) return -EFAULT; uss.ss_sp = compat_ptr(ptr); } @@ -161,10 +170,16 @@ asmlinkage long sys32_sigaltstack(const ret = do_sigaltstack(uss_ptr ? &uss : NULL, &uoss, regs->sp); set_fs(seg); if (ret >= 0 && uoss_ptr) { - if (!access_ok(VERIFY_WRITE, uoss_ptr, sizeof(stack_ia32_t)) || - __put_user(ptr_to_compat(uoss.ss_sp), &uoss_ptr->ss_sp) || - __put_user(uoss.ss_flags, &uoss_ptr->ss_flags) || - __put_user(uoss.ss_size, &uoss_ptr->ss_size)) + if (!access_ok(VERIFY_WRITE, uoss_ptr, sizeof(stack_ia32_t))) + return -EFAULT; + + put_user_try { + put_user_ex(ptr_to_compat(uoss.ss_sp), &uoss_ptr->ss_sp); + put_user_ex(uoss.ss_flags, &uoss_ptr->ss_flags); + put_user_ex(uoss.ss_size, &uoss_ptr->ss_size); + } put_user_catch(err); + + if (err) ret = -EFAULT; } return ret; @@ -173,75 +188,78 @@ asmlinkage long sys32_sigaltstack(const /* * Do a signal return; undo the signal stack. */ +#define loadsegment_gs(v) load_gs_index(v) +#define loadsegment_fs(v) loadsegment(fs, v) +#define loadsegment_ds(v) loadsegment(ds, v) +#define loadsegment_es(v) loadsegment(es, v) + +#define get_user_seg(seg) ({ unsigned int v; savesegment(seg, v); v; }) +#define set_user_seg(seg, v) loadsegment_##seg(v) + #define COPY(x) { \ - err |= __get_user(regs->x, &sc->x); \ + get_user_ex(regs->x, &sc->x); \ } -#define COPY_SEG_CPL3(seg) { \ - unsigned short tmp; \ - err |= __get_user(tmp, &sc->seg); \ - regs->seg = tmp | 3; \ -} +#define GET_SEG(seg) ({ \ + unsigned short tmp; \ + get_user_ex(tmp, &sc->seg); \ + tmp; \ +}) + +#define COPY_SEG_CPL3(seg) do { \ + regs->seg = GET_SEG(seg) | 3; \ +} while (0) #define RELOAD_SEG(seg) { \ - unsigned int cur, pre; \ - err |= __get_user(pre, &sc->seg); \ - savesegment(seg, cur); \ + unsigned int pre = GET_SEG(seg); \ + unsigned int cur = get_user_seg(seg); \ pre |= 3; \ if (pre != cur) \ - loadsegment(seg, pre); \ + set_user_seg(seg, pre); \ } static int ia32_restore_sigcontext(struct pt_regs *regs, struct sigcontext_ia32 __user *sc, unsigned int *pax) { - unsigned int tmpflags, gs, oldgs, err = 0; + unsigned int tmpflags, err = 0; void __user *buf; u32 tmp; /* Always make any pending restarted system calls return -EINTR */ current_thread_info()->restart_block.fn = do_no_restart_syscall; -#if DEBUG_SIG - printk(KERN_DEBUG "SIG restore_sigcontext: " - "sc=%p err(%x) eip(%x) cs(%x) flg(%x)\n", - sc, sc->err, sc->ip, sc->cs, sc->flags); -#endif - - /* - * Reload fs and gs if they have changed in the signal - * handler. This does not handle long fs/gs base changes in - * the handler, but does not clobber them at least in the - * normal case. - */ - err |= __get_user(gs, &sc->gs); - gs |= 3; - savesegment(gs, oldgs); - if (gs != oldgs) - load_gs_index(gs); - - RELOAD_SEG(fs); - RELOAD_SEG(ds); - RELOAD_SEG(es); - - COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx); - COPY(dx); COPY(cx); COPY(ip); - /* Don't touch extended registers */ - - COPY_SEG_CPL3(cs); - COPY_SEG_CPL3(ss); - - err |= __get_user(tmpflags, &sc->flags); - regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS); - /* disable syscall checks */ - regs->orig_ax = -1; - - err |= __get_user(tmp, &sc->fpstate); - buf = compat_ptr(tmp); - err |= restore_i387_xstate_ia32(buf); + get_user_try { + /* + * Reload fs and gs if they have changed in the signal + * handler. This does not handle long fs/gs base changes in + * the handler, but does not clobber them at least in the + * normal case. + */ + RELOAD_SEG(gs); + RELOAD_SEG(fs); + RELOAD_SEG(ds); + RELOAD_SEG(es); + + COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx); + COPY(dx); COPY(cx); COPY(ip); + /* Don't touch extended registers */ + + COPY_SEG_CPL3(cs); + COPY_SEG_CPL3(ss); + + get_user_ex(tmpflags, &sc->flags); + regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS); + /* disable syscall checks */ + regs->orig_ax = -1; + + get_user_ex(tmp, &sc->fpstate); + buf = compat_ptr(tmp); + err |= restore_i387_xstate_ia32(buf); + + get_user_ex(*pax, &sc->ax); + } get_user_catch(err); - err |= __get_user(*pax, &sc->ax); return err; } @@ -317,38 +335,36 @@ static int ia32_setup_sigcontext(struct void __user *fpstate, struct pt_regs *regs, unsigned int mask) { - int tmp, err = 0; + int err = 0; - savesegment(gs, tmp); - err |= __put_user(tmp, (unsigned int __user *)&sc->gs); - savesegment(fs, tmp); - err |= __put_user(tmp, (unsigned int __user *)&sc->fs); - savesegment(ds, tmp); - err |= __put_user(tmp, (unsigned int __user *)&sc->ds); - savesegment(es, tmp); - err |= __put_user(tmp, (unsigned int __user *)&sc->es); - - err |= __put_user(regs->di, &sc->di); - err |= __put_user(regs->si, &sc->si); - err |= __put_user(regs->bp, &sc->bp); - err |= __put_user(regs->sp, &sc->sp); - err |= __put_user(regs->bx, &sc->bx); - err |= __put_user(regs->dx, &sc->dx); - err |= __put_user(regs->cx, &sc->cx); - err |= __put_user(regs->ax, &sc->ax); - err |= __put_user(current->thread.trap_no, &sc->trapno); - err |= __put_user(current->thread.error_code, &sc->err); - err |= __put_user(regs->ip, &sc->ip); - err |= __put_user(regs->cs, (unsigned int __user *)&sc->cs); - err |= __put_user(regs->flags, &sc->flags); - err |= __put_user(regs->sp, &sc->sp_at_signal); - err |= __put_user(regs->ss, (unsigned int __user *)&sc->ss); - - err |= __put_user(ptr_to_compat(fpstate), &sc->fpstate); - - /* non-iBCS2 extensions.. */ - err |= __put_user(mask, &sc->oldmask); - err |= __put_user(current->thread.cr2, &sc->cr2); + put_user_try { + put_user_ex(get_user_seg(gs), (unsigned int __user *)&sc->gs); + put_user_ex(get_user_seg(fs), (unsigned int __user *)&sc->fs); + put_user_ex(get_user_seg(ds), (unsigned int __user *)&sc->ds); + put_user_ex(get_user_seg(es), (unsigned int __user *)&sc->es); + + put_user_ex(regs->di, &sc->di); + put_user_ex(regs->si, &sc->si); + put_user_ex(regs->bp, &sc->bp); + put_user_ex(regs->sp, &sc->sp); + put_user_ex(regs->bx, &sc->bx); + put_user_ex(regs->dx, &sc->dx); + put_user_ex(regs->cx, &sc->cx); + put_user_ex(regs->ax, &sc->ax); + put_user_ex(current->thread.trap_no, &sc->trapno); + put_user_ex(current->thread.error_code, &sc->err); + put_user_ex(regs->ip, &sc->ip); + put_user_ex(regs->cs, (unsigned int __user *)&sc->cs); + put_user_ex(regs->flags, &sc->flags); + put_user_ex(regs->sp, &sc->sp_at_signal); + put_user_ex(regs->ss, (unsigned int __user *)&sc->ss); + + put_user_ex(ptr_to_compat(fpstate), &sc->fpstate); + + /* non-iBCS2 extensions.. */ + put_user_ex(mask, &sc->oldmask); + put_user_ex(current->thread.cr2, &sc->cr2); + } put_user_catch(err); return err; } @@ -437,13 +453,17 @@ int ia32_setup_frame(int sig, struct k_s else restorer = &frame->retcode; } - err |= __put_user(ptr_to_compat(restorer), &frame->pretcode); - /* - * These are actually not used anymore, but left because some - * gdb versions depend on them as a marker. - */ - err |= __put_user(*((u64 *)&code), (u64 *)frame->retcode); + put_user_try { + put_user_ex(ptr_to_compat(restorer), &frame->pretcode); + + /* + * These are actually not used anymore, but left because some + * gdb versions depend on them as a marker. + */ + put_user_ex(*((u64 *)&code), (u64 *)frame->retcode); + } put_user_catch(err); + if (err) return -EFAULT; @@ -462,11 +482,6 @@ int ia32_setup_frame(int sig, struct k_s regs->cs = __USER32_CS; regs->ss = __USER32_DS; -#if DEBUG_SIG - printk(KERN_DEBUG "SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n", - current->comm, current->pid, frame, regs->ip, frame->pretcode); -#endif - return 0; } @@ -496,41 +511,40 @@ int ia32_setup_rt_frame(int sig, struct if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame))) return -EFAULT; - err |= __put_user(sig, &frame->sig); - err |= __put_user(ptr_to_compat(&frame->info), &frame->pinfo); - err |= __put_user(ptr_to_compat(&frame->uc), &frame->puc); - err |= copy_siginfo_to_user32(&frame->info, info); - if (err) - return -EFAULT; + put_user_try { + put_user_ex(sig, &frame->sig); + put_user_ex(ptr_to_compat(&frame->info), &frame->pinfo); + put_user_ex(ptr_to_compat(&frame->uc), &frame->puc); + err |= copy_siginfo_to_user32(&frame->info, info); + + /* Create the ucontext. */ + if (cpu_has_xsave) + put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags); + else + put_user_ex(0, &frame->uc.uc_flags); + put_user_ex(0, &frame->uc.uc_link); + put_user_ex(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp); + put_user_ex(sas_ss_flags(regs->sp), + &frame->uc.uc_stack.ss_flags); + put_user_ex(current->sas_ss_size, &frame->uc.uc_stack.ss_size); + err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate, + regs, set->sig[0]); + err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); - /* Create the ucontext. */ - if (cpu_has_xsave) - err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags); - else - err |= __put_user(0, &frame->uc.uc_flags); - err |= __put_user(0, &frame->uc.uc_link); - err |= __put_user(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp); - err |= __put_user(sas_ss_flags(regs->sp), - &frame->uc.uc_stack.ss_flags); - err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size); - err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate, - regs, set->sig[0]); - err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); - if (err) - return -EFAULT; + if (ka->sa.sa_flags & SA_RESTORER) + restorer = ka->sa.sa_restorer; + else + restorer = VDSO32_SYMBOL(current->mm->context.vdso, + rt_sigreturn); + put_user_ex(ptr_to_compat(restorer), &frame->pretcode); + + /* + * Not actually used anymore, but left because some gdb + * versions need it. + */ + put_user_ex(*((u64 *)&code), (u64 *)frame->retcode); + } put_user_catch(err); - if (ka->sa.sa_flags & SA_RESTORER) - restorer = ka->sa.sa_restorer; - else - restorer = VDSO32_SYMBOL(current->mm->context.vdso, - rt_sigreturn); - err |= __put_user(ptr_to_compat(restorer), &frame->pretcode); - - /* - * Not actually used anymore, but left because some gdb - * versions need it. - */ - err |= __put_user(*((u64 *)&code), (u64 *)frame->retcode); if (err) return -EFAULT; @@ -549,10 +563,5 @@ int ia32_setup_rt_frame(int sig, struct regs->cs = __USER32_CS; regs->ss = __USER32_DS; -#if DEBUG_SIG - printk(KERN_DEBUG "SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n", - current->comm, current->pid, frame, regs->ip, frame->pretcode); -#endif - return 0; } Index: linux-2.6-tip/arch/x86/ia32/ia32entry.S =================================================================== --- linux-2.6-tip.orig/arch/x86/ia32/ia32entry.S +++ linux-2.6-tip/arch/x86/ia32/ia32entry.S @@ -112,8 +112,8 @@ ENTRY(ia32_sysenter_target) CFI_DEF_CFA rsp,0 CFI_REGISTER rsp,rbp SWAPGS_UNSAFE_STACK - movq %gs:pda_kernelstack, %rsp - addq $(PDA_STACKOFFSET),%rsp + movq PER_CPU_VAR(kernel_stack), %rsp + addq $(KERNEL_STACK_OFFSET),%rsp /* * No need to follow this irqs on/off section: the syscall * disabled irqs, here we enable it straight after entry: @@ -273,13 +273,13 @@ ENDPROC(ia32_sysenter_target) ENTRY(ia32_cstar_target) CFI_STARTPROC32 simple CFI_SIGNAL_FRAME - CFI_DEF_CFA rsp,PDA_STACKOFFSET + CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET CFI_REGISTER rip,rcx /*CFI_REGISTER rflags,r11*/ SWAPGS_UNSAFE_STACK movl %esp,%r8d CFI_REGISTER rsp,r8 - movq %gs:pda_kernelstack,%rsp + movq PER_CPU_VAR(kernel_stack),%rsp /* * No need to follow this irqs on/off section: the syscall * disabled irqs and here we enable it straight after entry: @@ -825,7 +825,11 @@ ia32_sys_call_table: .quad compat_sys_signalfd4 .quad sys_eventfd2 .quad sys_epoll_create1 - .quad sys_dup3 /* 330 */ + .quad sys_dup3 /* 330 */ .quad sys_pipe2 .quad sys_inotify_init1 + .quad quiet_ni_syscall /* preadv */ + .quad quiet_ni_syscall /* pwritev */ + .quad compat_sys_rt_tgsigqueueinfo /* 335 */ + .quad sys_perf_counter_open ia32_syscall_end: Index: linux-2.6-tip/arch/x86/include/asm/a.out-core.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/a.out-core.h +++ linux-2.6-tip/arch/x86/include/asm/a.out-core.h @@ -55,7 +55,7 @@ static inline void aout_dump_thread(stru dump->regs.ds = (u16)regs->ds; dump->regs.es = (u16)regs->es; dump->regs.fs = (u16)regs->fs; - savesegment(gs, dump->regs.gs); + dump->regs.gs = get_user_gs(regs); dump->regs.orig_ax = regs->orig_ax; dump->regs.ip = regs->ip; dump->regs.cs = (u16)regs->cs; Index: linux-2.6-tip/arch/x86/include/asm/acpi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/acpi.h +++ linux-2.6-tip/arch/x86/include/asm/acpi.h @@ -50,8 +50,8 @@ #define ACPI_ASM_MACROS #define BREAKPOINT3 -#define ACPI_DISABLE_IRQS() local_irq_disable() -#define ACPI_ENABLE_IRQS() local_irq_enable() +#define ACPI_DISABLE_IRQS() local_irq_disable_nort() +#define ACPI_ENABLE_IRQS() local_irq_enable_nort() #define ACPI_FLUSH_CPU_CACHE() wbinvd() int __acpi_acquire_global_lock(unsigned int *lock); @@ -102,9 +102,6 @@ static inline void disable_acpi(void) acpi_noirq = 1; } -/* Fixmap pages to reserve for ACPI boot-time tables (see fixmap.h) */ -#define FIX_ACPI_PAGES 4 - extern int acpi_gsi_to_irq(u32 gsi, unsigned int *irq); static inline void acpi_noirq_set(void) { acpi_noirq = 1; } Index: linux-2.6-tip/arch/x86/include/asm/apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/apic.h +++ linux-2.6-tip/arch/x86/include/asm/apic.h @@ -1,15 +1,18 @@ #ifndef _ASM_X86_APIC_H #define _ASM_X86_APIC_H -#include +#include #include +#include #include -#include -#include +#include #include +#include +#include +#include +#include #include -#include #include #define ARCH_APICTIMER_STOPS_ON_C3 1 @@ -33,7 +36,13 @@ } while (0) +#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32) extern void generic_apic_probe(void); +#else +static inline void generic_apic_probe(void) +{ +} +#endif #ifdef CONFIG_X86_LOCAL_APIC @@ -41,6 +50,21 @@ extern unsigned int apic_verbosity; extern int local_apic_timer_c2_ok; extern int disable_apic; + +#ifdef CONFIG_SMP +extern void __inquire_remote_apic(int apicid); +#else /* CONFIG_SMP */ +static inline void __inquire_remote_apic(int apicid) +{ +} +#endif /* CONFIG_SMP */ + +static inline void default_inquire_remote_apic(int apicid) +{ + if (apic_verbosity >= APIC_DEBUG) + __inquire_remote_apic(apicid); +} + /* * Basic functions accessing APICs. */ @@ -51,7 +75,14 @@ extern int disable_apic; #define setup_secondary_clock setup_secondary_APIC_clock #endif +#ifdef CONFIG_X86_VSMP extern int is_vsmp_box(void); +#else +static inline int is_vsmp_box(void) +{ + return 0; +} +#endif extern void xapic_wait_icr_idle(void); extern u32 safe_xapic_wait_icr_idle(void); extern void xapic_icr_write(u32, u32); @@ -71,6 +102,22 @@ static inline u32 native_apic_mem_read(u return *((volatile u32 *)(APIC_BASE + reg)); } +extern void native_apic_wait_icr_idle(void); +extern u32 native_safe_apic_wait_icr_idle(void); +extern void native_apic_icr_write(u32 low, u32 id); +extern u64 native_apic_icr_read(void); + +#ifdef CONFIG_X86_X2APIC +/* + * Make previous memory operations globally visible before + * sending the IPI through x2apic wrmsr. We need a serializing instruction or + * mfence for this. + */ +static inline void x2apic_wrmsr_fence(void) +{ + asm volatile("mfence" : : : "memory"); +} + static inline void native_apic_msr_write(u32 reg, u32 v) { if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || @@ -91,8 +138,32 @@ static inline u32 native_apic_msr_read(u return low; } -#ifndef CONFIG_X86_32 -extern int x2apic; +static inline void native_x2apic_wait_icr_idle(void) +{ + /* no need to wait for icr idle in x2apic */ + return; +} + +static inline u32 native_safe_x2apic_wait_icr_idle(void) +{ + /* no need to wait for icr idle in x2apic */ + return 0; +} + +static inline void native_x2apic_icr_write(u32 low, u32 id) +{ + wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low); +} + +static inline u64 native_x2apic_icr_read(void) +{ + unsigned long val; + + rdmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), val); + return val; +} + +extern int x2apic, x2apic_phys; extern void check_x2apic(void); extern void enable_x2apic(void); extern void enable_IR_x2apic(void); @@ -110,30 +181,27 @@ static inline int x2apic_enabled(void) return 0; } #else -#define x2apic_enabled() 0 -#endif - -struct apic_ops { - u32 (*read)(u32 reg); - void (*write)(u32 reg, u32 v); - u64 (*icr_read)(void); - void (*icr_write)(u32 low, u32 high); - void (*wait_icr_idle)(void); - u32 (*safe_wait_icr_idle)(void); -}; +static inline void check_x2apic(void) +{ +} +static inline void enable_x2apic(void) +{ +} +static inline void enable_IR_x2apic(void) +{ +} +static inline int x2apic_enabled(void) +{ + return 0; +} -extern struct apic_ops *apic_ops; +#define x2apic 0 -#define apic_read (apic_ops->read) -#define apic_write (apic_ops->write) -#define apic_icr_read (apic_ops->icr_read) -#define apic_icr_write (apic_ops->icr_write) -#define apic_wait_icr_idle (apic_ops->wait_icr_idle) -#define safe_apic_wait_icr_idle (apic_ops->safe_wait_icr_idle) +#endif extern int get_physical_broadcast(void); -#ifdef CONFIG_X86_64 +#ifdef CONFIG_X86_X2APIC static inline void ack_x2APIC_irq(void) { /* Docs say use 0 for future compatibility */ @@ -141,18 +209,6 @@ static inline void ack_x2APIC_irq(void) } #endif - -static inline void ack_APIC_irq(void) -{ - /* - * ack_APIC_irq() actually gets compiled as a single instruction - * ... yummie. - */ - - /* Docs say use 0 for future compatibility */ - apic_write(APIC_EOI, 0); -} - extern int lapic_get_maxlvt(void); extern void clear_local_APIC(void); extern void connect_bsp_APIC(void); @@ -196,4 +252,329 @@ static inline void disable_local_APIC(vo #endif /* !CONFIG_X86_LOCAL_APIC */ +#ifdef CONFIG_X86_64 +#define SET_APIC_ID(x) (apic->set_apic_id(x)) +#else + +#endif + +/* + * Copyright 2004 James Cleverdon, IBM. + * Subject to the GNU Public License, v.2 + * + * Generic APIC sub-arch data struct. + * + * Hacked for x86-64 by James Cleverdon from i386 architecture code by + * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and + * James Cleverdon. + */ +struct apic { + char *name; + + int (*probe)(void); + int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); + int (*apic_id_registered)(void); + + u32 irq_delivery_mode; + u32 irq_dest_mode; + + const struct cpumask *(*target_cpus)(void); + + int disable_esr; + + int dest_logical; + unsigned long (*check_apicid_used)(physid_mask_t bitmap, int apicid); + unsigned long (*check_apicid_present)(int apicid); + + void (*vector_allocation_domain)(int cpu, struct cpumask *retmask); + void (*init_apic_ldr)(void); + + physid_mask_t (*ioapic_phys_id_map)(physid_mask_t map); + + void (*setup_apic_routing)(void); + int (*multi_timer_check)(int apic, int irq); + int (*apicid_to_node)(int logical_apicid); + int (*cpu_to_logical_apicid)(int cpu); + int (*cpu_present_to_apicid)(int mps_cpu); + physid_mask_t (*apicid_to_cpu_present)(int phys_apicid); + void (*setup_portio_remap)(void); + int (*check_phys_apicid_present)(int boot_cpu_physical_apicid); + void (*enable_apic_mode)(void); + int (*phys_pkg_id)(int cpuid_apic, int index_msb); + + /* + * When one of the next two hooks returns 1 the apic + * is switched to this. Essentially they are additional + * probe functions: + */ + int (*mps_oem_check)(struct mpc_table *mpc, char *oem, char *productid); + + unsigned int (*get_apic_id)(unsigned long x); + unsigned long (*set_apic_id)(unsigned int id); + unsigned long apic_id_mask; + + unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask); + unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask, + const struct cpumask *andmask); + + /* ipi */ + void (*send_IPI_mask)(const struct cpumask *mask, int vector); + void (*send_IPI_mask_allbutself)(const struct cpumask *mask, + int vector); + void (*send_IPI_allbutself)(int vector); + void (*send_IPI_all)(int vector); + void (*send_IPI_self)(int vector); + + /* wakeup_secondary_cpu */ + int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip); + + int trampoline_phys_low; + int trampoline_phys_high; + + void (*wait_for_init_deassert)(atomic_t *deassert); + void (*smp_callin_clear_local_apic)(void); + void (*inquire_remote_apic)(int apicid); + + /* apic ops */ + u32 (*read)(u32 reg); + void (*write)(u32 reg, u32 v); + u64 (*icr_read)(void); + void (*icr_write)(u32 low, u32 high); + void (*wait_icr_idle)(void); + u32 (*safe_wait_icr_idle)(void); +}; + +/* + * Pointer to the local APIC driver in use on this system (there's + * always just one such driver in use - the kernel decides via an + * early probing process which one it picks - and then sticks to it): + */ +extern struct apic *apic; + +/* + * APIC functionality to boot other CPUs - only used on SMP: + */ +#ifdef CONFIG_SMP +extern atomic_t init_deasserted; +extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip); +#endif + +static inline u32 apic_read(u32 reg) +{ + return apic->read(reg); +} + +static inline void apic_write(u32 reg, u32 val) +{ + apic->write(reg, val); +} + +static inline u64 apic_icr_read(void) +{ + return apic->icr_read(); +} + +static inline void apic_icr_write(u32 low, u32 high) +{ + apic->icr_write(low, high); +} + +static inline void apic_wait_icr_idle(void) +{ + apic->wait_icr_idle(); +} + +static inline u32 safe_apic_wait_icr_idle(void) +{ + return apic->safe_wait_icr_idle(); +} + + +static inline void ack_APIC_irq(void) +{ +#ifdef CONFIG_X86_LOCAL_APIC + /* + * ack_APIC_irq() actually gets compiled as a single instruction + * ... yummie. + */ + + /* Docs say use 0 for future compatibility */ + apic_write(APIC_EOI, 0); +#endif +} + +static inline unsigned default_get_apic_id(unsigned long x) +{ + unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR)); + + if (APIC_XAPIC(ver)) + return (x >> 24) & 0xFF; + else + return (x >> 24) & 0x0F; +} + +/* + * Warm reset vector default position: + */ +#define DEFAULT_TRAMPOLINE_PHYS_LOW 0x467 +#define DEFAULT_TRAMPOLINE_PHYS_HIGH 0x469 + +#ifdef CONFIG_X86_64 +extern struct apic apic_flat; +extern struct apic apic_physflat; +extern struct apic apic_x2apic_cluster; +extern struct apic apic_x2apic_phys; +extern int default_acpi_madt_oem_check(char *, char *); + +extern void apic_send_IPI_self(int vector); + +extern struct apic apic_x2apic_uv_x; +DECLARE_PER_CPU(int, x2apic_extra_bits); + +extern int default_cpu_present_to_apicid(int mps_cpu); +extern int default_check_phys_apicid_present(int boot_cpu_physical_apicid); +#endif + +static inline void default_wait_for_init_deassert(atomic_t *deassert) +{ + while (!atomic_read(deassert)) + cpu_relax(); + return; +} + +extern void generic_bigsmp_probe(void); + + +#ifdef CONFIG_X86_LOCAL_APIC + +#include + +#define APIC_DFR_VALUE (APIC_DFR_FLAT) + +static inline const struct cpumask *default_target_cpus(void) +{ +#ifdef CONFIG_SMP + return cpu_online_mask; +#else + return cpumask_of(0); +#endif +} + +DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid); + + +static inline unsigned int read_apic_id(void) +{ + unsigned int reg; + + reg = apic_read(APIC_ID); + + return apic->get_apic_id(reg); +} + +extern void default_setup_apic_routing(void); + +#ifdef CONFIG_X86_32 +/* + * Set up the logical destination ID. + * + * Intel recommends to set DFR, LDR and TPR before enabling + * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel + * document number 292116). So here it goes... + */ +extern void default_init_apic_ldr(void); + +static inline int default_apic_id_registered(void) +{ + return physid_isset(read_apic_id(), phys_cpu_present_map); +} + +static inline int default_phys_pkg_id(int cpuid_apic, int index_msb) +{ + return cpuid_apic >> index_msb; +} + +extern int default_apicid_to_node(int logical_apicid); + +#endif + +static inline unsigned int +default_cpu_mask_to_apicid(const struct cpumask *cpumask) +{ + return cpumask_bits(cpumask)[0] & APIC_ALL_CPUS; +} + +static inline unsigned int +default_cpu_mask_to_apicid_and(const struct cpumask *cpumask, + const struct cpumask *andmask) +{ + unsigned long mask1 = cpumask_bits(cpumask)[0]; + unsigned long mask2 = cpumask_bits(andmask)[0]; + unsigned long mask3 = cpumask_bits(cpu_online_mask)[0]; + + return (unsigned int)(mask1 & mask2 & mask3); +} + +static inline unsigned long default_check_apicid_used(physid_mask_t bitmap, int apicid) +{ + return physid_isset(apicid, bitmap); +} + +static inline unsigned long default_check_apicid_present(int bit) +{ + return physid_isset(bit, phys_cpu_present_map); +} + +static inline physid_mask_t default_ioapic_phys_id_map(physid_mask_t phys_map) +{ + return phys_map; +} + +/* Mapping from cpu number to logical apicid */ +static inline int default_cpu_to_logical_apicid(int cpu) +{ + return 1 << cpu; +} + +static inline int __default_cpu_present_to_apicid(int mps_cpu) +{ + if (mps_cpu < nr_cpu_ids && cpu_present(mps_cpu)) + return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu); + else + return BAD_APICID; +} + +static inline int +__default_check_phys_apicid_present(int boot_cpu_physical_apicid) +{ + return physid_isset(boot_cpu_physical_apicid, phys_cpu_present_map); +} + +#ifdef CONFIG_X86_32 +static inline int default_cpu_present_to_apicid(int mps_cpu) +{ + return __default_cpu_present_to_apicid(mps_cpu); +} + +static inline int +default_check_phys_apicid_present(int boot_cpu_physical_apicid) +{ + return __default_check_phys_apicid_present(boot_cpu_physical_apicid); +} +#else +extern int default_cpu_present_to_apicid(int mps_cpu); +extern int default_check_phys_apicid_present(int boot_cpu_physical_apicid); +#endif + +static inline physid_mask_t default_apicid_to_cpu_present(int phys_apicid) +{ + return physid_mask_of_physid(phys_apicid); +} + +#endif /* CONFIG_X86_LOCAL_APIC */ + +#ifdef CONFIG_X86_32 +extern u8 cpu_2_logical_apicid[NR_CPUS]; +#endif + #endif /* _ASM_X86_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/apicdef.h +++ linux-2.6-tip/arch/x86/include/asm/apicdef.h @@ -53,6 +53,7 @@ #define APIC_ESR_SENDILL 0x00020 #define APIC_ESR_RECVILL 0x00040 #define APIC_ESR_ILLREGA 0x00080 +#define APIC_LVTCMCI 0x2f0 #define APIC_ICR 0x300 #define APIC_DEST_SELF 0x40000 #define APIC_DEST_ALLINC 0x80000 Index: linux-2.6-tip/arch/x86/include/asm/apicnum.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/apicnum.h @@ -0,0 +1,12 @@ +#ifndef _ASM_X86_APICNUM_H +#define _ASM_X86_APICNUM_H + +/* define MAX_IO_APICS */ +#ifdef CONFIG_X86_32 +# define MAX_IO_APICS 64 +#else +# define MAX_IO_APICS 128 +# define MAX_LOCAL_APIC 32768 +#endif + +#endif /* _ASM_X86_APICNUM_H */ Index: linux-2.6-tip/arch/x86/include/asm/apm.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/apm.h @@ -0,0 +1,73 @@ +/* + * Machine specific APM BIOS functions for generic. + * Split out from apm.c by Osamu Tomita + */ + +#ifndef _ASM_X86_MACH_DEFAULT_APM_H +#define _ASM_X86_MACH_DEFAULT_APM_H + +#ifdef APM_ZERO_SEGS +# define APM_DO_ZERO_SEGS \ + "pushl %%ds\n\t" \ + "pushl %%es\n\t" \ + "xorl %%edx, %%edx\n\t" \ + "mov %%dx, %%ds\n\t" \ + "mov %%dx, %%es\n\t" \ + "mov %%dx, %%fs\n\t" \ + "mov %%dx, %%gs\n\t" +# define APM_DO_POP_SEGS \ + "popl %%es\n\t" \ + "popl %%ds\n\t" +#else +# define APM_DO_ZERO_SEGS +# define APM_DO_POP_SEGS +#endif + +static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, + u32 *eax, u32 *ebx, u32 *ecx, + u32 *edx, u32 *esi) +{ + /* + * N.B. We do NOT need a cld after the BIOS call + * because we always save and restore the flags. + */ + __asm__ __volatile__(APM_DO_ZERO_SEGS + "pushl %%edi\n\t" + "pushl %%ebp\n\t" + "lcall *%%cs:apm_bios_entry\n\t" + "setc %%al\n\t" + "popl %%ebp\n\t" + "popl %%edi\n\t" + APM_DO_POP_SEGS + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx), + "=S" (*esi) + : "a" (func), "b" (ebx_in), "c" (ecx_in) + : "memory", "cc"); +} + +static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in, + u32 ecx_in, u32 *eax) +{ + int cx, dx, si; + u8 error; + + /* + * N.B. We do NOT need a cld after the BIOS call + * because we always save and restore the flags. + */ + __asm__ __volatile__(APM_DO_ZERO_SEGS + "pushl %%edi\n\t" + "pushl %%ebp\n\t" + "lcall *%%cs:apm_bios_entry\n\t" + "setc %%bl\n\t" + "popl %%ebp\n\t" + "popl %%edi\n\t" + APM_DO_POP_SEGS + : "=a" (*eax), "=b" (error), "=c" (cx), "=d" (dx), + "=S" (si) + : "a" (func), "b" (ebx_in), "c" (ecx_in) + : "memory", "cc"); + return error; +} + +#endif /* _ASM_X86_MACH_DEFAULT_APM_H */ Index: linux-2.6-tip/arch/x86/include/asm/arch_hooks.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/arch_hooks.h +++ /dev/null @@ -1,26 +0,0 @@ -#ifndef _ASM_X86_ARCH_HOOKS_H -#define _ASM_X86_ARCH_HOOKS_H - -#include - -/* - * linux/include/asm/arch_hooks.h - * - * define the architecture specific hooks - */ - -/* these aren't arch hooks, they are generic routines - * that can be used by the hooks */ -extern void init_ISA_irqs(void); -extern irqreturn_t timer_interrupt(int irq, void *dev_id); - -/* these are the defined hooks */ -extern void intr_init_hook(void); -extern void pre_intr_init_hook(void); -extern void pre_setup_arch_hook(void); -extern void trap_init_hook(void); -extern void pre_time_init_hook(void); -extern void time_init_hook(void); -extern void mca_nmi_hook(void); - -#endif /* _ASM_X86_ARCH_HOOKS_H */ Index: linux-2.6-tip/arch/x86/include/asm/atomic_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/atomic_32.h +++ linux-2.6-tip/arch/x86/include/asm/atomic_32.h @@ -180,10 +180,10 @@ static inline int atomic_add_return(int #ifdef CONFIG_M386 no_xadd: /* Legacy 386 processor */ - local_irq_save(flags); + raw_local_irq_save(flags); __i = atomic_read(v); atomic_set(v, i + __i); - local_irq_restore(flags); + raw_local_irq_restore(flags); return i + __i; #endif } @@ -247,5 +247,223 @@ static inline int atomic_add_unless(atom #define smp_mb__before_atomic_inc() barrier() #define smp_mb__after_atomic_inc() barrier() +/* An 64bit atomic type */ + +typedef struct { + unsigned long long counter; +} atomic64_t; + +#define ATOMIC64_INIT(val) { (val) } + +/** + * atomic64_read - read atomic64 variable + * @v: pointer of type atomic64_t + * + * Atomically reads the value of @v. + * Doesn't imply a read memory barrier. + */ +#define __atomic64_read(ptr) ((ptr)->counter) + +static inline unsigned long long +cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new) +{ + asm volatile( + + LOCK_PREFIX "cmpxchg8b (%[ptr])\n" + + : "=A" (old) + + : [ptr] "D" (ptr), + "A" (old), + "b" (ll_low(new)), + "c" (ll_high(new)) + + : "memory"); + + return old; +} + +static inline unsigned long long +atomic64_cmpxchg(atomic64_t *ptr, unsigned long long old_val, + unsigned long long new_val) +{ + return cmpxchg8b(&ptr->counter, old_val, new_val); +} + +/** + * atomic64_set - set atomic64 variable + * @ptr: pointer to type atomic64_t + * @new_val: value to assign + * + * Atomically sets the value of @ptr to @new_val. + */ +static inline void atomic64_set(atomic64_t *ptr, unsigned long long new_val) +{ + unsigned long long old_val; + + do { + old_val = atomic_read(ptr); + } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val); +} + +/** + * atomic64_read - read atomic64 variable + * @ptr: pointer to type atomic64_t + * + * Atomically reads the value of @ptr and returns it. + */ +static inline unsigned long long atomic64_read(atomic64_t *ptr) +{ + unsigned long long curr_val; + + do { + curr_val = __atomic64_read(ptr); + } while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val); + + return curr_val; +} + +/** + * atomic64_add_return - add and return + * @delta: integer value to add + * @ptr: pointer to type atomic64_t + * + * Atomically adds @delta to @ptr and returns @delta + *@ptr + */ +static inline unsigned long long +atomic64_add_return(unsigned long long delta, atomic64_t *ptr) +{ + unsigned long long old_val, new_val; + + do { + old_val = atomic_read(ptr); + new_val = old_val + delta; + + } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val); + + return new_val; +} + +static inline long atomic64_sub_return(unsigned long long delta, atomic64_t *ptr) +{ + return atomic64_add_return(-delta, ptr); +} + +static inline long atomic64_inc_return(atomic64_t *ptr) +{ + return atomic64_add_return(1, ptr); +} + +static inline long atomic64_dec_return(atomic64_t *ptr) +{ + return atomic64_sub_return(1, ptr); +} + +/** + * atomic64_add - add integer to atomic64 variable + * @delta: integer value to add + * @ptr: pointer to type atomic64_t + * + * Atomically adds @delta to @ptr. + */ +static inline void atomic64_add(unsigned long long delta, atomic64_t *ptr) +{ + atomic64_add_return(delta, ptr); +} + +/** + * atomic64_sub - subtract the atomic64 variable + * @delta: integer value to subtract + * @ptr: pointer to type atomic64_t + * + * Atomically subtracts @delta from @ptr. + */ +static inline void atomic64_sub(unsigned long long delta, atomic64_t *ptr) +{ + atomic64_add(-delta, ptr); +} + +/** + * atomic64_sub_and_test - subtract value from variable and test result + * @delta: integer value to subtract + * @ptr: pointer to type atomic64_t + * + * Atomically subtracts @delta from @ptr and returns + * true if the result is zero, or false for all + * other cases. + */ +static inline int +atomic64_sub_and_test(unsigned long long delta, atomic64_t *ptr) +{ + unsigned long long old_val = atomic64_sub_return(delta, ptr); + + return old_val == 0; +} + +/** + * atomic64_inc - increment atomic64 variable + * @ptr: pointer to type atomic64_t + * + * Atomically increments @ptr by 1. + */ +static inline void atomic64_inc(atomic64_t *ptr) +{ + atomic64_add(1, ptr); +} + +/** + * atomic64_dec - decrement atomic64 variable + * @ptr: pointer to type atomic64_t + * + * Atomically decrements @ptr by 1. + */ +static inline void atomic64_dec(atomic64_t *ptr) +{ + atomic64_sub(1, ptr); +} + +/** + * atomic64_dec_and_test - decrement and test + * @ptr: pointer to type atomic64_t + * + * Atomically decrements @ptr by 1 and + * returns true if the result is 0, or false for all other + * cases. + */ +static inline int atomic64_dec_and_test(atomic64_t *ptr) +{ + return atomic64_sub_and_test(1, ptr); +} + +/** + * atomic64_inc_and_test - increment and test + * @ptr: pointer to type atomic64_t + * + * Atomically increments @ptr by 1 + * and returns true if the result is zero, or false for all + * other cases. + */ +static inline int atomic64_inc_and_test(atomic64_t *ptr) +{ + return atomic64_sub_and_test(-1, ptr); +} + +/** + * atomic64_add_negative - add and test if negative + * @delta: integer value to add + * @ptr: pointer to type atomic64_t + * + * Atomically adds @delta to @ptr and returns true + * if the result is negative, or false when + * result is greater than or equal to zero. + */ +static inline int +atomic64_add_negative(unsigned long long delta, atomic64_t *ptr) +{ + long long old_val = atomic64_add_return(delta, ptr); + + return old_val < 0; +} + #include #endif /* _ASM_X86_ATOMIC_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/bigsmp/apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/bigsmp/apic.h +++ /dev/null @@ -1,155 +0,0 @@ -#ifndef __ASM_MACH_APIC_H -#define __ASM_MACH_APIC_H - -#define xapic_phys_to_log_apicid(cpu) (per_cpu(x86_bios_cpu_apicid, cpu)) -#define esr_disable (1) - -static inline int apic_id_registered(void) -{ - return (1); -} - -static inline const cpumask_t *target_cpus(void) -{ -#ifdef CONFIG_SMP - return &cpu_online_map; -#else - return &cpumask_of_cpu(0); -#endif -} - -#undef APIC_DEST_LOGICAL -#define APIC_DEST_LOGICAL 0 -#define APIC_DFR_VALUE (APIC_DFR_FLAT) -#define INT_DELIVERY_MODE (dest_Fixed) -#define INT_DEST_MODE (0) /* phys delivery to target proc */ -#define NO_BALANCE_IRQ (0) - -static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid) -{ - return (0); -} - -static inline unsigned long check_apicid_present(int bit) -{ - return (1); -} - -static inline unsigned long calculate_ldr(int cpu) -{ - unsigned long val, id; - val = apic_read(APIC_LDR) & ~APIC_LDR_MASK; - id = xapic_phys_to_log_apicid(cpu); - val |= SET_APIC_LOGICAL_ID(id); - return val; -} - -/* - * Set up the logical destination ID. - * - * Intel recommends to set DFR, LDR and TPR before enabling - * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel - * document number 292116). So here it goes... - */ -static inline void init_apic_ldr(void) -{ - unsigned long val; - int cpu = smp_processor_id(); - - apic_write(APIC_DFR, APIC_DFR_VALUE); - val = calculate_ldr(cpu); - apic_write(APIC_LDR, val); -} - -static inline void setup_apic_routing(void) -{ - printk("Enabling APIC mode: %s. Using %d I/O APICs\n", - "Physflat", nr_ioapics); -} - -static inline int multi_timer_check(int apic, int irq) -{ - return (0); -} - -static inline int apicid_to_node(int logical_apicid) -{ - return apicid_2_node[hard_smp_processor_id()]; -} - -static inline int cpu_present_to_apicid(int mps_cpu) -{ - if (mps_cpu < nr_cpu_ids) - return (int) per_cpu(x86_bios_cpu_apicid, mps_cpu); - - return BAD_APICID; -} - -static inline physid_mask_t apicid_to_cpu_present(int phys_apicid) -{ - return physid_mask_of_physid(phys_apicid); -} - -extern u8 cpu_2_logical_apicid[]; -/* Mapping from cpu number to logical apicid */ -static inline int cpu_to_logical_apicid(int cpu) -{ - if (cpu >= nr_cpu_ids) - return BAD_APICID; - return cpu_physical_id(cpu); -} - -static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map) -{ - /* For clustered we don't have a good way to do this yet - hack */ - return physids_promote(0xFFL); -} - -static inline void setup_portio_remap(void) -{ -} - -static inline void enable_apic_mode(void) -{ -} - -static inline int check_phys_apicid_present(int boot_cpu_physical_apicid) -{ - return (1); -} - -/* As we are using single CPU as destination, pick only one CPU here */ -static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask) -{ - int cpu; - int apicid; - - cpu = first_cpu(*cpumask); - apicid = cpu_to_logical_apicid(cpu); - return apicid; -} - -static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask, - const struct cpumask *andmask) -{ - int cpu; - - /* - * We're using fixed IRQ delivery, can only return one phys APIC ID. - * May as well be the first. - */ - for_each_cpu_and(cpu, cpumask, andmask) - if (cpumask_test_cpu(cpu, cpu_online_mask)) - break; - if (cpu < nr_cpu_ids) - return cpu_to_logical_apicid(cpu); - - return BAD_APICID; -} - -static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb) -{ - return cpuid_apic >> index_msb; -} - -#endif /* __ASM_MACH_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/bigsmp/apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/bigsmp/apicdef.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_MACH_APICDEF_H -#define __ASM_MACH_APICDEF_H - -#define APIC_ID_MASK (0xFF<<24) - -static inline unsigned get_apic_id(unsigned long x) -{ - return (((x)>>24)&0xFF); -} - -#define GET_APIC_ID(x) get_apic_id(x) - -#endif Index: linux-2.6-tip/arch/x86/include/asm/bigsmp/ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/bigsmp/ipi.h +++ /dev/null @@ -1,22 +0,0 @@ -#ifndef __ASM_MACH_IPI_H -#define __ASM_MACH_IPI_H - -void send_IPI_mask_sequence(const struct cpumask *mask, int vector); -void send_IPI_mask_allbutself(const struct cpumask *mask, int vector); - -static inline void send_IPI_mask(const struct cpumask *mask, int vector) -{ - send_IPI_mask_sequence(mask, vector); -} - -static inline void send_IPI_allbutself(int vector) -{ - send_IPI_mask_allbutself(cpu_online_mask, vector); -} - -static inline void send_IPI_all(int vector) -{ - send_IPI_mask(cpu_online_mask, vector); -} - -#endif /* __ASM_MACH_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/boot.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/boot.h +++ linux-2.6-tip/arch/x86/include/asm/boot.h @@ -1,26 +1,36 @@ #ifndef _ASM_X86_BOOT_H #define _ASM_X86_BOOT_H -/* Don't touch these, unless you really know what you're doing. */ -#define DEF_SYSSEG 0x1000 -#define DEF_SYSSIZE 0x7F00 - /* Internal svga startup constants */ #define NORMAL_VGA 0xffff /* 80x25 mode */ #define EXTENDED_VGA 0xfffe /* 80x50 mode */ #define ASK_VGA 0xfffd /* ask for it at bootup */ +#ifdef __KERNEL__ + /* Physical address where kernel should be loaded. */ #define LOAD_PHYSICAL_ADDR ((CONFIG_PHYSICAL_START \ + (CONFIG_PHYSICAL_ALIGN - 1)) \ & ~(CONFIG_PHYSICAL_ALIGN - 1)) +#ifdef CONFIG_KERNEL_BZIP2 +#define BOOT_HEAP_SIZE 0x400000 +#else /* !CONFIG_KERNEL_BZIP2 */ + #ifdef CONFIG_X86_64 #define BOOT_HEAP_SIZE 0x7000 -#define BOOT_STACK_SIZE 0x4000 #else #define BOOT_HEAP_SIZE 0x4000 +#endif + +#endif /* !CONFIG_KERNEL_BZIP2 */ + +#ifdef CONFIG_X86_64 +#define BOOT_STACK_SIZE 0x4000 +#else #define BOOT_STACK_SIZE 0x1000 #endif +#endif /* __KERNEL__ */ + #endif /* _ASM_X86_BOOT_H */ Index: linux-2.6-tip/arch/x86/include/asm/cacheflush.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/cacheflush.h +++ linux-2.6-tip/arch/x86/include/asm/cacheflush.h @@ -5,24 +5,43 @@ #include /* Caches aren't brain-dead on the intel. */ -#define flush_cache_all() do { } while (0) -#define flush_cache_mm(mm) do { } while (0) -#define flush_cache_dup_mm(mm) do { } while (0) -#define flush_cache_range(vma, start, end) do { } while (0) -#define flush_cache_page(vma, vmaddr, pfn) do { } while (0) -#define flush_dcache_page(page) do { } while (0) -#define flush_dcache_mmap_lock(mapping) do { } while (0) -#define flush_dcache_mmap_unlock(mapping) do { } while (0) -#define flush_icache_range(start, end) do { } while (0) -#define flush_icache_page(vma, pg) do { } while (0) -#define flush_icache_user_range(vma, pg, adr, len) do { } while (0) -#define flush_cache_vmap(start, end) do { } while (0) -#define flush_cache_vunmap(start, end) do { } while (0) - -#define copy_to_user_page(vma, page, vaddr, dst, src, len) \ - memcpy((dst), (src), (len)) -#define copy_from_user_page(vma, page, vaddr, dst, src, len) \ - memcpy((dst), (src), (len)) +static inline void flush_cache_all(void) { } +static inline void flush_cache_mm(struct mm_struct *mm) { } +static inline void flush_cache_dup_mm(struct mm_struct *mm) { } +static inline void flush_cache_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { } +static inline void flush_cache_page(struct vm_area_struct *vma, + unsigned long vmaddr, unsigned long pfn) { } +static inline void flush_dcache_page(struct page *page) { } +static inline void flush_dcache_mmap_lock(struct address_space *mapping) { } +static inline void flush_dcache_mmap_unlock(struct address_space *mapping) { } +static inline void flush_icache_range(unsigned long start, + unsigned long end) { } +static inline void flush_icache_page(struct vm_area_struct *vma, + struct page *page) { } +static inline void flush_icache_user_range(struct vm_area_struct *vma, + struct page *page, + unsigned long addr, + unsigned long len) { } +static inline void flush_cache_vmap(unsigned long start, unsigned long end) { } +static inline void flush_cache_vunmap(unsigned long start, + unsigned long end) { } + +static inline void copy_to_user_page(struct vm_area_struct *vma, + struct page *page, unsigned long vaddr, + void *dst, const void *src, + unsigned long len) +{ + memcpy(dst, src, len); +} + +static inline void copy_from_user_page(struct vm_area_struct *vma, + struct page *page, unsigned long vaddr, + void *dst, const void *src, + unsigned long len) +{ + memcpy(dst, src, len); +} #define PG_non_WB PG_arch_1 PAGEFLAG(NonWB, non_WB) @@ -71,6 +90,9 @@ int set_memory_4k(unsigned long addr, in int set_memory_array_uc(unsigned long *addr, int addrinarray); int set_memory_array_wb(unsigned long *addr, int addrinarray); +int set_pages_array_uc(struct page **pages, int addrinarray); +int set_pages_array_wb(struct page **pages, int addrinarray); + /* * For legacy compatibility with the old APIs, a few functions * are provided that work on a "struct page". @@ -104,6 +126,11 @@ void clflush_cache_range(void *addr, uns #ifdef CONFIG_DEBUG_RODATA void mark_rodata_ro(void); extern const int rodata_test_data; +void set_kernel_text_rw(void); +void set_kernel_text_ro(void); +#else +static inline void set_kernel_text_rw(void) { } +static inline void set_kernel_text_ro(void) { } #endif #ifdef CONFIG_DEBUG_RODATA_TEST Index: linux-2.6-tip/arch/x86/include/asm/calling.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/calling.h +++ linux-2.6-tip/arch/x86/include/asm/calling.h @@ -1,5 +1,55 @@ /* - * Some macros to handle stack frames in assembly. + + x86 function call convention, 64-bit: + ------------------------------------- + arguments | callee-saved | extra caller-saved | return + [callee-clobbered] | | [callee-clobbered] | + --------------------------------------------------------------------------- + rdi rsi rdx rcx r8-9 | rbx rbp [*] r12-15 | r10-11 | rax, rdx [**] + + ( rsp is obviously invariant across normal function calls. (gcc can 'merge' + functions when it sees tail-call optimization possibilities) rflags is + clobbered. Leftover arguments are passed over the stack frame.) + + [*] In the frame-pointers case rbp is fixed to the stack frame. + + [**] for struct return values wider than 64 bits the return convention is a + bit more complex: up to 128 bits width we return small structures + straight in rax, rdx. For structures larger than that (3 words or + larger) the caller puts a pointer to an on-stack return struct + [allocated in the caller's stack frame] into the first argument - i.e. + into rdi. All other arguments shift up by one in this case. + Fortunately this case is rare in the kernel. + +For 32-bit we have the following conventions - kernel is built with +-mregparm=3 and -freg-struct-return: + + x86 function calling convention, 32-bit: + ---------------------------------------- + arguments | callee-saved | extra caller-saved | return + [callee-clobbered] | | [callee-clobbered] | + ------------------------------------------------------------------------- + eax edx ecx | ebx edi esi ebp [*] | | eax, edx [**] + + ( here too esp is obviously invariant across normal function calls. eflags + is clobbered. Leftover arguments are passed over the stack frame. ) + + [*] In the frame-pointers case ebp is fixed to the stack frame. + + [**] We build with -freg-struct-return, which on 32-bit means similar + semantics as on 64-bit: edx can be used for a second return value + (i.e. covering integer and structure sizes up to 64 bits) - after that + it gets more complex and more expensive: 3-word or larger struct returns + get done in the caller's frame and the pointer to the return struct goes + into regparm0, i.e. eax - the other arguments shift up and the + function's register parameters degenerate to regparm=2 in essence. + +*/ + + +/* + * 64-bit system call stack frame layout defines and helpers, + * for assembly code: */ #define R15 0 @@ -9,7 +59,7 @@ #define RBP 32 #define RBX 40 -/* arguments: interrupts/non tracing syscalls only save upto here*/ +/* arguments: interrupts/non tracing syscalls only save up to here: */ #define R11 48 #define R10 56 #define R9 64 @@ -22,7 +72,7 @@ #define ORIG_RAX 120 /* + error_code */ /* end of arguments */ -/* cpu exception frame or undefined in case of fast syscall. */ +/* cpu exception frame or undefined in case of fast syscall: */ #define RIP 128 #define CS 136 #define EFLAGS 144 Index: linux-2.6-tip/arch/x86/include/asm/cpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/cpu.h +++ linux-2.6-tip/arch/x86/include/asm/cpu.h @@ -7,6 +7,20 @@ #include #include +#ifdef CONFIG_SMP + +extern void prefill_possible_map(void); + +#else /* CONFIG_SMP */ + +static inline void prefill_possible_map(void) {} + +#define cpu_physical_id(cpu) boot_cpu_physical_apicid +#define safe_smp_processor_id() 0 +#define stack_smp_processor_id() 0 + +#endif /* CONFIG_SMP */ + struct x86_cpu { struct cpu cpu; }; @@ -17,4 +31,7 @@ extern void arch_unregister_cpu(int); #endif DECLARE_PER_CPU(int, cpu_state); + +extern unsigned int boot_cpu_id; + #endif /* _ASM_X86_CPU_H */ Index: linux-2.6-tip/arch/x86/include/asm/cpu_debug.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/cpu_debug.h @@ -0,0 +1,226 @@ +#ifndef _ASM_X86_CPU_DEBUG_H +#define _ASM_X86_CPU_DEBUG_H + +/* + * CPU x86 architecture debug + * + * Copyright(C) 2009 Jaswinder Singh Rajput + */ + +/* Register flags */ +enum cpu_debug_bit { +/* Model Specific Registers (MSRs) */ + CPU_MC_BIT, /* Machine Check */ + CPU_MONITOR_BIT, /* Monitor */ + CPU_TIME_BIT, /* Time */ + CPU_PMC_BIT, /* Performance Monitor */ + CPU_PLATFORM_BIT, /* Platform */ + CPU_APIC_BIT, /* APIC */ + CPU_POWERON_BIT, /* Power-on */ + CPU_CONTROL_BIT, /* Control */ + CPU_FEATURES_BIT, /* Features control */ + CPU_LBRANCH_BIT, /* Last Branch */ + CPU_BIOS_BIT, /* BIOS */ + CPU_FREQ_BIT, /* Frequency */ + CPU_MTTR_BIT, /* MTRR */ + CPU_PERF_BIT, /* Performance */ + CPU_CACHE_BIT, /* Cache */ + CPU_SYSENTER_BIT, /* Sysenter */ + CPU_THERM_BIT, /* Thermal */ + CPU_MISC_BIT, /* Miscellaneous */ + CPU_DEBUG_BIT, /* Debug */ + CPU_PAT_BIT, /* PAT */ + CPU_VMX_BIT, /* VMX */ + CPU_CALL_BIT, /* System Call */ + CPU_BASE_BIT, /* BASE Address */ + CPU_VER_BIT, /* Version ID */ + CPU_CONF_BIT, /* Configuration */ + CPU_SMM_BIT, /* System mgmt mode */ + CPU_SVM_BIT, /*Secure Virtual Machine*/ + CPU_OSVM_BIT, /* OS-Visible Workaround*/ +/* Standard Registers */ + CPU_TSS_BIT, /* Task Stack Segment */ + CPU_CR_BIT, /* Control Registers */ + CPU_DT_BIT, /* Descriptor Table */ +/* End of Registers flags */ + CPU_REG_ALL_BIT, /* Select all Registers */ +}; + +#define CPU_REG_ALL (~0) /* Select all Registers */ + +#define CPU_MC (1 << CPU_MC_BIT) +#define CPU_MONITOR (1 << CPU_MONITOR_BIT) +#define CPU_TIME (1 << CPU_TIME_BIT) +#define CPU_PMC (1 << CPU_PMC_BIT) +#define CPU_PLATFORM (1 << CPU_PLATFORM_BIT) +#define CPU_APIC (1 << CPU_APIC_BIT) +#define CPU_POWERON (1 << CPU_POWERON_BIT) +#define CPU_CONTROL (1 << CPU_CONTROL_BIT) +#define CPU_FEATURES (1 << CPU_FEATURES_BIT) +#define CPU_LBRANCH (1 << CPU_LBRANCH_BIT) +#define CPU_BIOS (1 << CPU_BIOS_BIT) +#define CPU_FREQ (1 << CPU_FREQ_BIT) +#define CPU_MTRR (1 << CPU_MTTR_BIT) +#define CPU_PERF (1 << CPU_PERF_BIT) +#define CPU_CACHE (1 << CPU_CACHE_BIT) +#define CPU_SYSENTER (1 << CPU_SYSENTER_BIT) +#define CPU_THERM (1 << CPU_THERM_BIT) +#define CPU_MISC (1 << CPU_MISC_BIT) +#define CPU_DEBUG (1 << CPU_DEBUG_BIT) +#define CPU_PAT (1 << CPU_PAT_BIT) +#define CPU_VMX (1 << CPU_VMX_BIT) +#define CPU_CALL (1 << CPU_CALL_BIT) +#define CPU_BASE (1 << CPU_BASE_BIT) +#define CPU_VER (1 << CPU_VER_BIT) +#define CPU_CONF (1 << CPU_CONF_BIT) +#define CPU_SMM (1 << CPU_SMM_BIT) +#define CPU_SVM (1 << CPU_SVM_BIT) +#define CPU_OSVM (1 << CPU_OSVM_BIT) +#define CPU_TSS (1 << CPU_TSS_BIT) +#define CPU_CR (1 << CPU_CR_BIT) +#define CPU_DT (1 << CPU_DT_BIT) + +/* Register file flags */ +enum cpu_file_bit { + CPU_INDEX_BIT, /* index */ + CPU_VALUE_BIT, /* value */ +}; + +#define CPU_FILE_VALUE (1 << CPU_VALUE_BIT) + +/* + * DisplayFamily_DisplayModel Processor Families/Processor Number Series + * -------------------------- ------------------------------------------ + * 05_01, 05_02, 05_04 Pentium, Pentium with MMX + * + * 06_01 Pentium Pro + * 06_03, 06_05 Pentium II Xeon, Pentium II + * 06_07, 06_08, 06_0A, 06_0B Pentium III Xeon, Pentum III + * + * 06_09, 060D Pentium M + * + * 06_0E Core Duo, Core Solo + * + * 06_0F Xeon 3000, 3200, 5100, 5300, 7300 series, + * Core 2 Quad, Core 2 Extreme, Core 2 Duo, + * Pentium dual-core + * 06_17 Xeon 5200, 5400 series, Core 2 Quad Q9650 + * + * 06_1C Atom + * + * 0F_00, 0F_01, 0F_02 Xeon, Xeon MP, Pentium 4 + * 0F_03, 0F_04 Xeon, Xeon MP, Pentium 4, Pentium D + * + * 0F_06 Xeon 7100, 5000 Series, Xeon MP, + * Pentium 4, Pentium D + */ + +/* Register processors bits */ +enum cpu_processor_bit { + CPU_NONE, +/* Intel */ + CPU_INTEL_PENTIUM_BIT, + CPU_INTEL_P6_BIT, + CPU_INTEL_PENTIUM_M_BIT, + CPU_INTEL_CORE_BIT, + CPU_INTEL_CORE2_BIT, + CPU_INTEL_ATOM_BIT, + CPU_INTEL_XEON_P4_BIT, + CPU_INTEL_XEON_MP_BIT, +/* AMD */ + CPU_AMD_K6_BIT, + CPU_AMD_K7_BIT, + CPU_AMD_K8_BIT, + CPU_AMD_0F_BIT, + CPU_AMD_10_BIT, + CPU_AMD_11_BIT, +}; + +#define CPU_INTEL_PENTIUM (1 << CPU_INTEL_PENTIUM_BIT) +#define CPU_INTEL_P6 (1 << CPU_INTEL_P6_BIT) +#define CPU_INTEL_PENTIUM_M (1 << CPU_INTEL_PENTIUM_M_BIT) +#define CPU_INTEL_CORE (1 << CPU_INTEL_CORE_BIT) +#define CPU_INTEL_CORE2 (1 << CPU_INTEL_CORE2_BIT) +#define CPU_INTEL_ATOM (1 << CPU_INTEL_ATOM_BIT) +#define CPU_INTEL_XEON_P4 (1 << CPU_INTEL_XEON_P4_BIT) +#define CPU_INTEL_XEON_MP (1 << CPU_INTEL_XEON_MP_BIT) + +#define CPU_INTEL_PX (CPU_INTEL_P6 | CPU_INTEL_PENTIUM_M) +#define CPU_INTEL_COREX (CPU_INTEL_CORE | CPU_INTEL_CORE2) +#define CPU_INTEL_XEON (CPU_INTEL_XEON_P4 | CPU_INTEL_XEON_MP) +#define CPU_CO_AT (CPU_INTEL_CORE | CPU_INTEL_ATOM) +#define CPU_C2_AT (CPU_INTEL_CORE2 | CPU_INTEL_ATOM) +#define CPU_CX_AT (CPU_INTEL_COREX | CPU_INTEL_ATOM) +#define CPU_CX_XE (CPU_INTEL_COREX | CPU_INTEL_XEON) +#define CPU_P6_XE (CPU_INTEL_P6 | CPU_INTEL_XEON) +#define CPU_PM_CO_AT (CPU_INTEL_PENTIUM_M | CPU_CO_AT) +#define CPU_C2_AT_XE (CPU_C2_AT | CPU_INTEL_XEON) +#define CPU_CX_AT_XE (CPU_CX_AT | CPU_INTEL_XEON) +#define CPU_P6_CX_AT (CPU_INTEL_P6 | CPU_CX_AT) +#define CPU_P6_CX_XE (CPU_P6_XE | CPU_INTEL_COREX) +#define CPU_P6_CX_AT_XE (CPU_INTEL_P6 | CPU_CX_AT_XE) +#define CPU_PM_CX_AT_XE (CPU_INTEL_PENTIUM_M | CPU_CX_AT_XE) +#define CPU_PM_CX_AT (CPU_INTEL_PENTIUM_M | CPU_CX_AT) +#define CPU_PM_CX_XE (CPU_INTEL_PENTIUM_M | CPU_CX_XE) +#define CPU_PX_CX_AT (CPU_INTEL_PX | CPU_CX_AT) +#define CPU_PX_CX_AT_XE (CPU_INTEL_PX | CPU_CX_AT_XE) + +/* Select all supported Intel CPUs */ +#define CPU_INTEL_ALL (CPU_INTEL_PENTIUM | CPU_PX_CX_AT_XE) + +#define CPU_AMD_K6 (1 << CPU_AMD_K6_BIT) +#define CPU_AMD_K7 (1 << CPU_AMD_K7_BIT) +#define CPU_AMD_K8 (1 << CPU_AMD_K8_BIT) +#define CPU_AMD_0F (1 << CPU_AMD_0F_BIT) +#define CPU_AMD_10 (1 << CPU_AMD_10_BIT) +#define CPU_AMD_11 (1 << CPU_AMD_11_BIT) + +#define CPU_K10_PLUS (CPU_AMD_10 | CPU_AMD_11) +#define CPU_K0F_PLUS (CPU_AMD_0F | CPU_K10_PLUS) +#define CPU_K8_PLUS (CPU_AMD_K8 | CPU_K0F_PLUS) +#define CPU_K7_PLUS (CPU_AMD_K7 | CPU_K8_PLUS) + +/* Select all supported AMD CPUs */ +#define CPU_AMD_ALL (CPU_AMD_K6 | CPU_K7_PLUS) + +/* Select all supported CPUs */ +#define CPU_ALL (CPU_INTEL_ALL | CPU_AMD_ALL) + +#define MAX_CPU_FILES 512 + +struct cpu_private { + unsigned cpu; + unsigned type; + unsigned reg; + unsigned file; +}; + +struct cpu_debug_base { + char *name; /* Register name */ + unsigned flag; /* Register flag */ + unsigned write; /* Register write flag */ +}; + +/* + * Currently it looks similar to cpu_debug_base but once we add more files + * cpu_file_base will go in different direction + */ +struct cpu_file_base { + char *name; /* Register file name */ + unsigned flag; /* Register file flag */ + unsigned write; /* Register write flag */ +}; + +struct cpu_cpuX_base { + struct dentry *dentry; /* Register dentry */ + int init; /* Register index file */ +}; + +struct cpu_debug_range { + unsigned min; /* Register range min */ + unsigned max; /* Register range max */ + unsigned flag; /* Supported flags */ + unsigned model; /* Supported models */ +}; + +#endif /* _ASM_X86_CPU_DEBUG_H */ Index: linux-2.6-tip/arch/x86/include/asm/cpumask.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/cpumask.h @@ -0,0 +1,14 @@ +#ifndef _ASM_X86_CPUMASK_H +#define _ASM_X86_CPUMASK_H +#ifndef __ASSEMBLY__ +#include + +extern cpumask_var_t cpu_callin_mask; +extern cpumask_var_t cpu_callout_mask; +extern cpumask_var_t cpu_initialized_mask; +extern cpumask_var_t cpu_sibling_setup_mask; + +extern void setup_cpu_local_masks(void); + +#endif /* __ASSEMBLY__ */ +#endif /* _ASM_X86_CPUMASK_H */ Index: linux-2.6-tip/arch/x86/include/asm/current.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/current.h +++ linux-2.6-tip/arch/x86/include/asm/current.h @@ -1,39 +1,21 @@ #ifndef _ASM_X86_CURRENT_H #define _ASM_X86_CURRENT_H -#ifdef CONFIG_X86_32 #include #include +#ifndef __ASSEMBLY__ struct task_struct; DECLARE_PER_CPU(struct task_struct *, current_task); -static __always_inline struct task_struct *get_current(void) -{ - return x86_read_percpu(current_task); -} - -#else /* X86_32 */ - -#ifndef __ASSEMBLY__ -#include - -struct task_struct; static __always_inline struct task_struct *get_current(void) { - return read_pda(pcurrent); + return percpu_read(current_task); } -#else /* __ASSEMBLY__ */ - -#include -#define GET_CURRENT(reg) movq %gs:(pda_pcurrent),reg +#define current get_current() #endif /* __ASSEMBLY__ */ -#endif /* X86_32 */ - -#define current get_current() - #endif /* _ASM_X86_CURRENT_H */ Index: linux-2.6-tip/arch/x86/include/asm/desc.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/desc.h +++ linux-2.6-tip/arch/x86/include/asm/desc.h @@ -91,7 +91,6 @@ static inline int desc_empty(const void #define store_gdt(dtr) native_store_gdt(dtr) #define store_idt(dtr) native_store_idt(dtr) #define store_tr(tr) (tr = native_store_tr()) -#define store_ldt(ldt) asm("sldt %0":"=m" (ldt)) #define load_TLS(t, cpu) native_load_tls(t, cpu) #define set_ldt native_set_ldt @@ -112,6 +111,8 @@ static inline void paravirt_free_ldt(str } #endif /* CONFIG_PARAVIRT */ +#define store_ldt(ldt) asm("sldt %0" : "=m"(ldt)) + static inline void native_write_idt_entry(gate_desc *idt, int entry, const gate_desc *gate) { Index: linux-2.6-tip/arch/x86/include/asm/device.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/device.h +++ linux-2.6-tip/arch/x86/include/asm/device.h @@ -6,7 +6,7 @@ struct dev_archdata { void *acpi_handle; #endif #ifdef CONFIG_X86_64 -struct dma_mapping_ops *dma_ops; +struct dma_map_ops *dma_ops; #endif #ifdef CONFIG_DMAR void *iommu; /* hook for IOMMU specific extension */ Index: linux-2.6-tip/arch/x86/include/asm/dma-mapping.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/dma-mapping.h +++ linux-2.6-tip/arch/x86/include/asm/dma-mapping.h @@ -6,7 +6,10 @@ * Documentation/DMA-API.txt for documentation. */ +#include #include +#include +#include #include #include #include @@ -16,47 +19,9 @@ extern int iommu_merge; extern struct device x86_dma_fallback_dev; extern int panic_on_overflow; -struct dma_mapping_ops { - int (*mapping_error)(struct device *dev, - dma_addr_t dma_addr); - void* (*alloc_coherent)(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp); - void (*free_coherent)(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle); - dma_addr_t (*map_single)(struct device *hwdev, phys_addr_t ptr, - size_t size, int direction); - void (*unmap_single)(struct device *dev, dma_addr_t addr, - size_t size, int direction); - void (*sync_single_for_cpu)(struct device *hwdev, - dma_addr_t dma_handle, size_t size, - int direction); - void (*sync_single_for_device)(struct device *hwdev, - dma_addr_t dma_handle, size_t size, - int direction); - void (*sync_single_range_for_cpu)(struct device *hwdev, - dma_addr_t dma_handle, unsigned long offset, - size_t size, int direction); - void (*sync_single_range_for_device)(struct device *hwdev, - dma_addr_t dma_handle, unsigned long offset, - size_t size, int direction); - void (*sync_sg_for_cpu)(struct device *hwdev, - struct scatterlist *sg, int nelems, - int direction); - void (*sync_sg_for_device)(struct device *hwdev, - struct scatterlist *sg, int nelems, - int direction); - int (*map_sg)(struct device *hwdev, struct scatterlist *sg, - int nents, int direction); - void (*unmap_sg)(struct device *hwdev, - struct scatterlist *sg, int nents, - int direction); - int (*dma_supported)(struct device *hwdev, u64 mask); - int is_phys; -}; +extern struct dma_map_ops *dma_ops; -extern struct dma_mapping_ops *dma_ops; - -static inline struct dma_mapping_ops *get_dma_ops(struct device *dev) +static inline struct dma_map_ops *get_dma_ops(struct device *dev) { #ifdef CONFIG_X86_32 return dma_ops; @@ -71,7 +36,7 @@ static inline struct dma_mapping_ops *ge /* Make sure we keep the same behaviour */ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = get_dma_ops(dev); if (ops->mapping_error) return ops->mapping_error(dev, dma_addr); @@ -90,137 +55,174 @@ extern void *dma_generic_alloc_coherent( static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size, - int direction) + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); + dma_addr_t addr; - BUG_ON(!valid_dma_direction(direction)); - return ops->map_single(hwdev, virt_to_phys(ptr), size, direction); + kmemcheck_mark_initialized(ptr, size); + BUG_ON(!valid_dma_direction(dir)); + addr = ops->map_page(hwdev, virt_to_page(ptr), + (unsigned long)ptr & ~PAGE_MASK, size, + dir, NULL); + debug_dma_map_page(hwdev, virt_to_page(ptr), + (unsigned long)ptr & ~PAGE_MASK, size, + dir, addr, true); + return addr; } static inline void dma_unmap_single(struct device *dev, dma_addr_t addr, size_t size, - int direction) + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = get_dma_ops(dev); - BUG_ON(!valid_dma_direction(direction)); - if (ops->unmap_single) - ops->unmap_single(dev, addr, size, direction); + BUG_ON(!valid_dma_direction(dir)); + if (ops->unmap_page) + ops->unmap_page(dev, addr, size, dir, NULL); + debug_dma_unmap_page(dev, addr, size, dir, true); } static inline int dma_map_sg(struct device *hwdev, struct scatterlist *sg, - int nents, int direction) + int nents, enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); + int ents; + + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) + kmemcheck_mark_initialized(sg_virt(s), s->length); + BUG_ON(!valid_dma_direction(dir)); + ents = ops->map_sg(hwdev, sg, nents, dir, NULL); + debug_dma_map_sg(hwdev, sg, nents, ents, dir); - BUG_ON(!valid_dma_direction(direction)); - return ops->map_sg(hwdev, sg, nents, direction); + return ents; } static inline void dma_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, - int direction) + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); + debug_dma_unmap_sg(hwdev, sg, nents, dir); if (ops->unmap_sg) - ops->unmap_sg(hwdev, sg, nents, direction); + ops->unmap_sg(hwdev, sg, nents, dir, NULL); } static inline void dma_sync_single_for_cpu(struct device *hwdev, dma_addr_t dma_handle, - size_t size, int direction) + size_t size, enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_single_for_cpu) - ops->sync_single_for_cpu(hwdev, dma_handle, size, direction); + ops->sync_single_for_cpu(hwdev, dma_handle, size, dir); + debug_dma_sync_single_for_cpu(hwdev, dma_handle, size, dir); flush_write_buffers(); } static inline void dma_sync_single_for_device(struct device *hwdev, dma_addr_t dma_handle, - size_t size, int direction) + size_t size, enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_single_for_device) - ops->sync_single_for_device(hwdev, dma_handle, size, direction); + ops->sync_single_for_device(hwdev, dma_handle, size, dir); + debug_dma_sync_single_for_device(hwdev, dma_handle, size, dir); flush_write_buffers(); } static inline void dma_sync_single_range_for_cpu(struct device *hwdev, dma_addr_t dma_handle, - unsigned long offset, size_t size, int direction) + unsigned long offset, size_t size, + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_single_range_for_cpu) ops->sync_single_range_for_cpu(hwdev, dma_handle, offset, - size, direction); + size, dir); + debug_dma_sync_single_range_for_cpu(hwdev, dma_handle, + offset, size, dir); flush_write_buffers(); } static inline void dma_sync_single_range_for_device(struct device *hwdev, dma_addr_t dma_handle, unsigned long offset, size_t size, - int direction) + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_single_range_for_device) ops->sync_single_range_for_device(hwdev, dma_handle, - offset, size, direction); + offset, size, dir); + debug_dma_sync_single_range_for_device(hwdev, dma_handle, + offset, size, dir); flush_write_buffers(); } static inline void dma_sync_sg_for_cpu(struct device *hwdev, struct scatterlist *sg, - int nelems, int direction) + int nelems, enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_sg_for_cpu) - ops->sync_sg_for_cpu(hwdev, sg, nelems, direction); + ops->sync_sg_for_cpu(hwdev, sg, nelems, dir); + debug_dma_sync_sg_for_cpu(hwdev, sg, nelems, dir); flush_write_buffers(); } static inline void dma_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg, - int nelems, int direction) + int nelems, enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(hwdev); + struct dma_map_ops *ops = get_dma_ops(hwdev); - BUG_ON(!valid_dma_direction(direction)); + BUG_ON(!valid_dma_direction(dir)); if (ops->sync_sg_for_device) - ops->sync_sg_for_device(hwdev, sg, nelems, direction); + ops->sync_sg_for_device(hwdev, sg, nelems, dir); + debug_dma_sync_sg_for_device(hwdev, sg, nelems, dir); flush_write_buffers(); } static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, size_t offset, size_t size, - int direction) + enum dma_data_direction dir) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = get_dma_ops(dev); + dma_addr_t addr; - BUG_ON(!valid_dma_direction(direction)); - return ops->map_single(dev, page_to_phys(page) + offset, - size, direction); + kmemcheck_mark_initialized(page_address(page) + offset, size); + BUG_ON(!valid_dma_direction(dir)); + addr = ops->map_page(dev, page, offset, size, dir, NULL); + debug_dma_map_page(dev, page, offset, size, dir, addr, false); + + return addr; } static inline void dma_unmap_page(struct device *dev, dma_addr_t addr, - size_t size, int direction) + size_t size, enum dma_data_direction dir) { - dma_unmap_single(dev, addr, size, direction); + struct dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!valid_dma_direction(dir)); + if (ops->unmap_page) + ops->unmap_page(dev, addr, size, dir, NULL); + debug_dma_unmap_page(dev, addr, size, dir, false); } static inline void @@ -266,7 +268,7 @@ static inline void * dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = get_dma_ops(dev); void *memory; gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); @@ -285,20 +287,24 @@ dma_alloc_coherent(struct device *dev, s if (!ops->alloc_coherent) return NULL; - return ops->alloc_coherent(dev, size, dma_handle, - dma_alloc_coherent_gfp_flags(dev, gfp)); + memory = ops->alloc_coherent(dev, size, dma_handle, + dma_alloc_coherent_gfp_flags(dev, gfp)); + debug_dma_alloc_coherent(dev, size, *dma_handle, memory); + + return memory; } static inline void dma_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t bus) { - struct dma_mapping_ops *ops = get_dma_ops(dev); + struct dma_map_ops *ops = get_dma_ops(dev); WARN_ON(irqs_disabled()); /* for portability */ if (dma_release_from_coherent(dev, get_order(size), vaddr)) return; + debug_dma_free_coherent(dev, size, vaddr, bus); if (ops->free_coherent) ops->free_coherent(dev, size, vaddr, bus); } Index: linux-2.6-tip/arch/x86/include/asm/dmi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/dmi.h +++ linux-2.6-tip/arch/x86/include/asm/dmi.h @@ -1,22 +1,15 @@ #ifndef _ASM_X86_DMI_H #define _ASM_X86_DMI_H -#include - -#define DMI_MAX_DATA 2048 +#include +#include -extern int dmi_alloc_index; -extern char dmi_alloc_data[DMI_MAX_DATA]; +#include +#include -/* This is so early that there is no good way to allocate dynamic memory. - Allocate data in an BSS array. */ -static inline void *dmi_alloc(unsigned len) +static __always_inline __init void *dmi_alloc(unsigned len) { - int idx = dmi_alloc_index; - if ((dmi_alloc_index + len) > DMI_MAX_DATA) - return NULL; - dmi_alloc_index += len; - return dmi_alloc_data + idx; + return extend_brk(len, sizeof(int)); } /* Use early IO mappings for DMI because it's initialized early */ Index: linux-2.6-tip/arch/x86/include/asm/do_timer.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/do_timer.h @@ -0,0 +1,16 @@ +/* defines for inline arch setup functions */ +#include + +#include +#include + +/** + * do_timer_interrupt_hook - hook into timer tick + * + * Call the pit clock event handler. see asm/i8253.h + **/ + +static inline void do_timer_interrupt_hook(void) +{ + global_clock_event->event_handler(global_clock_event); +} Index: linux-2.6-tip/arch/x86/include/asm/e820.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/e820.h +++ linux-2.6-tip/arch/x86/include/asm/e820.h @@ -72,7 +72,7 @@ extern int e820_all_mapped(u64 start, u6 extern void e820_add_region(u64 start, u64 size, int type); extern void e820_print_map(char *who); extern int -sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, int *pnr_map); +sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, u32 *pnr_map); extern u64 e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type); extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type, Index: linux-2.6-tip/arch/x86/include/asm/elf.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/elf.h +++ linux-2.6-tip/arch/x86/include/asm/elf.h @@ -112,7 +112,7 @@ extern unsigned int vdso_enabled; * now struct_user_regs, they are different) */ -#define ELF_CORE_COPY_REGS(pr_reg, regs) \ +#define ELF_CORE_COPY_REGS_COMMON(pr_reg, regs) \ do { \ pr_reg[0] = regs->bx; \ pr_reg[1] = regs->cx; \ @@ -124,7 +124,6 @@ do { \ pr_reg[7] = regs->ds & 0xffff; \ pr_reg[8] = regs->es & 0xffff; \ pr_reg[9] = regs->fs & 0xffff; \ - savesegment(gs, pr_reg[10]); \ pr_reg[11] = regs->orig_ax; \ pr_reg[12] = regs->ip; \ pr_reg[13] = regs->cs & 0xffff; \ @@ -133,6 +132,18 @@ do { \ pr_reg[16] = regs->ss & 0xffff; \ } while (0); +#define ELF_CORE_COPY_REGS(pr_reg, regs) \ +do { \ + ELF_CORE_COPY_REGS_COMMON(pr_reg, regs);\ + pr_reg[10] = get_user_gs(regs); \ +} while (0); + +#define ELF_CORE_COPY_KERNEL_REGS(pr_reg, regs) \ +do { \ + ELF_CORE_COPY_REGS_COMMON(pr_reg, regs);\ + savesegment(gs, pr_reg[10]); \ +} while (0); + #define ELF_PLATFORM (utsname()->machine) #define set_personality_64bit() do { } while (0) Index: linux-2.6-tip/arch/x86/include/asm/entry_arch.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/entry_arch.h @@ -0,0 +1,59 @@ +/* + * This file is designed to contain the BUILD_INTERRUPT specifications for + * all of the extra named interrupt vectors used by the architecture. + * Usually this is the Inter Process Interrupts (IPIs) + */ + +/* + * The following vectors are part of the Linux architecture, there + * is no hardware IRQ pin equivalent for them, they are triggered + * through the ICC by us (IPIs) + */ +#ifdef CONFIG_SMP +BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR) +BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR) +BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR) +BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR) + +BUILD_INTERRUPT3(invalidate_interrupt0,INVALIDATE_TLB_VECTOR_START+0, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt1,INVALIDATE_TLB_VECTOR_START+1, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt2,INVALIDATE_TLB_VECTOR_START+2, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt3,INVALIDATE_TLB_VECTOR_START+3, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt4,INVALIDATE_TLB_VECTOR_START+4, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt5,INVALIDATE_TLB_VECTOR_START+5, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt6,INVALIDATE_TLB_VECTOR_START+6, + smp_invalidate_interrupt) +BUILD_INTERRUPT3(invalidate_interrupt7,INVALIDATE_TLB_VECTOR_START+7, + smp_invalidate_interrupt) +#endif + +BUILD_INTERRUPT(generic_interrupt, GENERIC_INTERRUPT_VECTOR) + +/* + * every pentium local APIC has two 'local interrupts', with a + * soft-definable vector attached to both interrupts, one of + * which is a timer interrupt, the other one is error counter + * overflow. Linux uses the local APIC timer interrupt to get + * a much simpler SMP time architecture: + */ +#ifdef CONFIG_X86_LOCAL_APIC + +BUILD_INTERRUPT(apic_timer_interrupt,LOCAL_TIMER_VECTOR) +BUILD_INTERRUPT(error_interrupt,ERROR_APIC_VECTOR) +BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR) + +#ifdef CONFIG_PERF_COUNTERS +BUILD_INTERRUPT(perf_counter_interrupt, LOCAL_PERF_VECTOR) +#endif + +#ifdef CONFIG_X86_MCE_P4THERMAL +BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR) +#endif + +#endif Index: linux-2.6-tip/arch/x86/include/asm/es7000/apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/es7000/apic.h +++ /dev/null @@ -1,242 +0,0 @@ -#ifndef __ASM_ES7000_APIC_H -#define __ASM_ES7000_APIC_H - -#include - -#define xapic_phys_to_log_apicid(cpu) per_cpu(x86_bios_cpu_apicid, cpu) -#define esr_disable (1) - -static inline int apic_id_registered(void) -{ - return (1); -} - -static inline const cpumask_t *target_cpus_cluster(void) -{ - return &CPU_MASK_ALL; -} - -static inline const cpumask_t *target_cpus(void) -{ - return &cpumask_of_cpu(smp_processor_id()); -} - -#define APIC_DFR_VALUE_CLUSTER (APIC_DFR_CLUSTER) -#define INT_DELIVERY_MODE_CLUSTER (dest_LowestPrio) -#define INT_DEST_MODE_CLUSTER (1) /* logical delivery broadcast to all procs */ -#define NO_BALANCE_IRQ_CLUSTER (1) - -#define APIC_DFR_VALUE (APIC_DFR_FLAT) -#define INT_DELIVERY_MODE (dest_Fixed) -#define INT_DEST_MODE (0) /* phys delivery to target procs */ -#define NO_BALANCE_IRQ (0) -#undef APIC_DEST_LOGICAL -#define APIC_DEST_LOGICAL 0x0 - -static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid) -{ - return 0; -} -static inline unsigned long check_apicid_present(int bit) -{ - return physid_isset(bit, phys_cpu_present_map); -} - -#define apicid_cluster(apicid) (apicid & 0xF0) - -static inline unsigned long calculate_ldr(int cpu) -{ - unsigned long id; - id = xapic_phys_to_log_apicid(cpu); - return (SET_APIC_LOGICAL_ID(id)); -} - -/* - * Set up the logical destination ID. - * - * Intel recommends to set DFR, LdR and TPR before enabling - * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel - * document number 292116). So here it goes... - */ -static inline void init_apic_ldr_cluster(void) -{ - unsigned long val; - int cpu = smp_processor_id(); - - apic_write(APIC_DFR, APIC_DFR_VALUE_CLUSTER); - val = calculate_ldr(cpu); - apic_write(APIC_LDR, val); -} - -static inline void init_apic_ldr(void) -{ - unsigned long val; - int cpu = smp_processor_id(); - - apic_write(APIC_DFR, APIC_DFR_VALUE); - val = calculate_ldr(cpu); - apic_write(APIC_LDR, val); -} - -extern int apic_version [MAX_APICS]; -static inline void setup_apic_routing(void) -{ - int apic = per_cpu(x86_bios_cpu_apicid, smp_processor_id()); - printk("Enabling APIC mode: %s. Using %d I/O APICs, target cpus %lx\n", - (apic_version[apic] == 0x14) ? - "Physical Cluster" : "Logical Cluster", - nr_ioapics, cpus_addr(*target_cpus())[0]); -} - -static inline int multi_timer_check(int apic, int irq) -{ - return 0; -} - -static inline int apicid_to_node(int logical_apicid) -{ - return 0; -} - - -static inline int cpu_present_to_apicid(int mps_cpu) -{ - if (!mps_cpu) - return boot_cpu_physical_apicid; - else if (mps_cpu < nr_cpu_ids) - return (int) per_cpu(x86_bios_cpu_apicid, mps_cpu); - else - return BAD_APICID; -} - -static inline physid_mask_t apicid_to_cpu_present(int phys_apicid) -{ - static int id = 0; - physid_mask_t mask; - mask = physid_mask_of_physid(id); - ++id; - return mask; -} - -extern u8 cpu_2_logical_apicid[]; -/* Mapping from cpu number to logical apicid */ -static inline int cpu_to_logical_apicid(int cpu) -{ -#ifdef CONFIG_SMP - if (cpu >= nr_cpu_ids) - return BAD_APICID; - return (int)cpu_2_logical_apicid[cpu]; -#else - return logical_smp_processor_id(); -#endif -} - -static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map) -{ - /* For clustered we don't have a good way to do this yet - hack */ - return physids_promote(0xff); -} - - -static inline void setup_portio_remap(void) -{ -} - -extern unsigned int boot_cpu_physical_apicid; -static inline int check_phys_apicid_present(int cpu_physical_apicid) -{ - boot_cpu_physical_apicid = read_apic_id(); - return (1); -} - -static inline unsigned int -cpu_mask_to_apicid_cluster(const struct cpumask *cpumask) -{ - int num_bits_set; - int cpus_found = 0; - int cpu; - int apicid; - - num_bits_set = cpumask_weight(cpumask); - /* Return id to all */ - if (num_bits_set == nr_cpu_ids) - return 0xFF; - /* - * The cpus in the mask must all be on the apic cluster. If are not - * on the same apicid cluster return default value of TARGET_CPUS. - */ - cpu = cpumask_first(cpumask); - apicid = cpu_to_logical_apicid(cpu); - while (cpus_found < num_bits_set) { - if (cpumask_test_cpu(cpu, cpumask)) { - int new_apicid = cpu_to_logical_apicid(cpu); - if (apicid_cluster(apicid) != - apicid_cluster(new_apicid)){ - printk ("%s: Not a valid mask!\n", __func__); - return 0xFF; - } - apicid = new_apicid; - cpus_found++; - } - cpu++; - } - return apicid; -} - -static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask) -{ - int num_bits_set; - int cpus_found = 0; - int cpu; - int apicid; - - num_bits_set = cpus_weight(*cpumask); - /* Return id to all */ - if (num_bits_set == nr_cpu_ids) - return cpu_to_logical_apicid(0); - /* - * The cpus in the mask must all be on the apic cluster. If are not - * on the same apicid cluster return default value of TARGET_CPUS. - */ - cpu = first_cpu(*cpumask); - apicid = cpu_to_logical_apicid(cpu); - while (cpus_found < num_bits_set) { - if (cpu_isset(cpu, *cpumask)) { - int new_apicid = cpu_to_logical_apicid(cpu); - if (apicid_cluster(apicid) != - apicid_cluster(new_apicid)){ - printk ("%s: Not a valid mask!\n", __func__); - return cpu_to_logical_apicid(0); - } - apicid = new_apicid; - cpus_found++; - } - cpu++; - } - return apicid; -} - - -static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask, - const struct cpumask *andmask) -{ - int apicid = cpu_to_logical_apicid(0); - cpumask_var_t cpumask; - - if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC)) - return apicid; - - cpumask_and(cpumask, inmask, andmask); - cpumask_and(cpumask, cpumask, cpu_online_mask); - apicid = cpu_mask_to_apicid(cpumask); - - free_cpumask_var(cpumask); - return apicid; -} - -static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb) -{ - return cpuid_apic >> index_msb; -} - -#endif /* __ASM_ES7000_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/es7000/apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/es7000/apicdef.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_ES7000_APICDEF_H -#define __ASM_ES7000_APICDEF_H - -#define APIC_ID_MASK (0xFF<<24) - -static inline unsigned get_apic_id(unsigned long x) -{ - return (((x)>>24)&0xFF); -} - -#define GET_APIC_ID(x) get_apic_id(x) - -#endif Index: linux-2.6-tip/arch/x86/include/asm/es7000/ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/es7000/ipi.h +++ /dev/null @@ -1,22 +0,0 @@ -#ifndef __ASM_ES7000_IPI_H -#define __ASM_ES7000_IPI_H - -void send_IPI_mask_sequence(const struct cpumask *mask, int vector); -void send_IPI_mask_allbutself(const struct cpumask *mask, int vector); - -static inline void send_IPI_mask(const struct cpumask *mask, int vector) -{ - send_IPI_mask_sequence(mask, vector); -} - -static inline void send_IPI_allbutself(int vector) -{ - send_IPI_mask_allbutself(cpu_online_mask, vector); -} - -static inline void send_IPI_all(int vector) -{ - send_IPI_mask(cpu_online_mask, vector); -} - -#endif /* __ASM_ES7000_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/es7000/mpparse.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/es7000/mpparse.h +++ /dev/null @@ -1,29 +0,0 @@ -#ifndef __ASM_ES7000_MPPARSE_H -#define __ASM_ES7000_MPPARSE_H - -#include - -extern int parse_unisys_oem (char *oemptr); -extern int find_unisys_acpi_oem_table(unsigned long *oem_addr); -extern void unmap_unisys_acpi_oem_table(unsigned long oem_addr); -extern void setup_unisys(void); - -#ifndef CONFIG_X86_GENERICARCH -extern int acpi_madt_oem_check(char *oem_id, char *oem_table_id); -extern int mps_oem_check(struct mpc_table *mpc, char *oem, char *productid); -#endif - -#ifdef CONFIG_ACPI - -static inline int es7000_check_dsdt(void) -{ - struct acpi_table_header header; - - if (ACPI_SUCCESS(acpi_get_table_header(ACPI_SIG_DSDT, 0, &header)) && - !strncmp(header.oem_id, "UNISYS", 6)) - return 1; - return 0; -} -#endif - -#endif /* __ASM_MACH_MPPARSE_H */ Index: linux-2.6-tip/arch/x86/include/asm/es7000/wakecpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/es7000/wakecpu.h +++ /dev/null @@ -1,37 +0,0 @@ -#ifndef __ASM_ES7000_WAKECPU_H -#define __ASM_ES7000_WAKECPU_H - -#define TRAMPOLINE_PHYS_LOW 0x467 -#define TRAMPOLINE_PHYS_HIGH 0x469 - -static inline void wait_for_init_deassert(atomic_t *deassert) -{ -#ifndef CONFIG_ES7000_CLUSTERED_APIC - while (!atomic_read(deassert)) - cpu_relax(); -#endif - return; -} - -/* Nothing to do for most platforms, since cleared by the INIT cycle */ -static inline void smp_callin_clear_local_apic(void) -{ -} - -static inline void store_NMI_vector(unsigned short *high, unsigned short *low) -{ -} - -static inline void restore_NMI_vector(unsigned short *high, unsigned short *low) -{ -} - -extern void __inquire_remote_apic(int apicid); - -static inline void inquire_remote_apic(int apicid) -{ - if (apic_verbosity >= APIC_DEBUG) - __inquire_remote_apic(apicid); -} - -#endif /* __ASM_MACH_WAKECPU_H */ Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h +++ linux-2.6-tip/arch/x86/include/asm/fixmap.h @@ -1,11 +1,147 @@ +/* + * fixmap.h: compile-time virtual memory allocation + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 1998 Ingo Molnar + * + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 + * x86_32 and x86_64 integration by Gustavo F. Padovan, February 2009 + */ + #ifndef _ASM_X86_FIXMAP_H #define _ASM_X86_FIXMAP_H +#ifndef __ASSEMBLY__ +#include +#include +#include +#include +#ifdef CONFIG_X86_32 +#include +#include +#else +#include +#endif + +/* + * We can't declare FIXADDR_TOP as variable for x86_64 because vsyscall + * uses fixmaps that relies on FIXADDR_TOP for proper address calculation. + * Because of this, FIXADDR_TOP x86 integration was left as later work. + */ +#ifdef CONFIG_X86_32 +/* used by vmalloc.c, vsyscall.lds.S. + * + * Leave one empty page between vmalloc'ed areas and + * the start of the fixmap. + */ +extern unsigned long __FIXADDR_TOP; +#define FIXADDR_TOP ((unsigned long)__FIXADDR_TOP) + +#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) +#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) +#else +#define FIXADDR_TOP (VSYSCALL_END-PAGE_SIZE) + +/* Only covers 32bit vsyscalls currently. Need another set for 64bit. */ +#define FIXADDR_USER_START ((unsigned long)VSYSCALL32_VSYSCALL) +#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE) +#endif + + +/* + * Here we define all the compile-time 'special' virtual + * addresses. The point is to have a constant address at + * compile time, but to set the physical address only + * in the boot process. + * for x86_32: We allocate these special addresses + * from the end of virtual memory (0xfffff000) backwards. + * Also this lets us do fail-safe vmalloc(), we + * can guarantee that these special addresses and + * vmalloc()-ed addresses never overlap. + * + * These 'compile-time allocated' memory buffers are + * fixed-size 4k pages (or larger if used with an increment + * higher than 1). Use set_fixmap(idx,phys) to associate + * physical memory with fixmap indices. + * + * TLB entries of such buffers will not be flushed across + * task switches. + */ +enum fixed_addresses { #ifdef CONFIG_X86_32 -# include "fixmap_32.h" + FIX_HOLE, + FIX_VDSO, #else -# include "fixmap_64.h" + VSYSCALL_LAST_PAGE, + VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE + + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1, + VSYSCALL_HPET, #endif + FIX_DBGP_BASE, + FIX_EARLYCON_MEM_BASE, +#ifdef CONFIG_X86_LOCAL_APIC + FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */ +#endif +#ifdef CONFIG_X86_IO_APIC + FIX_IO_APIC_BASE_0, + FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1, +#endif +#ifdef CONFIG_X86_VISWS_APIC + FIX_CO_CPU, /* Cobalt timer */ + FIX_CO_APIC, /* Cobalt APIC Redirection Table */ + FIX_LI_PCIA, /* Lithium PCI Bridge A */ + FIX_LI_PCIB, /* Lithium PCI Bridge B */ +#endif +#ifdef CONFIG_X86_F00F_BUG + FIX_F00F_IDT, /* Virtual mapping for IDT */ +#endif +#ifdef CONFIG_X86_CYCLONE_TIMER + FIX_CYCLONE_TIMER, /*cyclone timer register*/ +#endif +#ifdef CONFIG_X86_32 + FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */ + FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, +#ifdef CONFIG_PCI_MMCONFIG + FIX_PCIE_MCFG, +#endif +#endif +#ifdef CONFIG_PARAVIRT + FIX_PARAVIRT_BOOTMAP, +#endif + FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */ + FIX_TEXT_POKE1, + __end_of_permanent_fixed_addresses, +#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT + FIX_OHCI1394_BASE, +#endif + /* + * 256 temporary boot-time mappings, used by early_ioremap(), + * before ioremap() is functional. + * + * We round it up to the next 256 pages boundary so that we + * can have a single pgd entry and a single pte table: + */ +#define NR_FIX_BTMAPS 64 +#define FIX_BTMAPS_SLOTS 4 + FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - + (__end_of_permanent_fixed_addresses & 255), + FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, +#ifdef CONFIG_X86_32 + FIX_WP_TEST, +#endif + __end_of_fixed_addresses +}; + + +extern void reserve_top_address(unsigned long reserve); + +#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) +#define FIXADDR_BOOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) +#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) +#define FIXADDR_BOOT_START (FIXADDR_TOP - FIXADDR_BOOT_SIZE) extern int fixmaps_set; @@ -15,11 +151,11 @@ extern pte_t *pkmap_page_table; void __native_set_fixmap(enum fixed_addresses idx, pte_t pte); void native_set_fixmap(enum fixed_addresses idx, - unsigned long phys, pgprot_t flags); + phys_addr_t phys, pgprot_t flags); #ifndef CONFIG_PARAVIRT static inline void __set_fixmap(enum fixed_addresses idx, - unsigned long phys, pgprot_t flags) + phys_addr_t phys, pgprot_t flags) { native_set_fixmap(idx, phys, flags); } @@ -69,4 +205,5 @@ static inline unsigned long virt_to_fix( BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); return __virt_to_fix(vaddr); } +#endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_FIXMAP_H */ Index: linux-2.6-tip/arch/x86/include/asm/fixmap_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap_32.h +++ /dev/null @@ -1,119 +0,0 @@ -/* - * fixmap.h: compile-time virtual memory allocation - * - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (C) 1998 Ingo Molnar - * - * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 - */ - -#ifndef _ASM_X86_FIXMAP_32_H -#define _ASM_X86_FIXMAP_32_H - - -/* used by vmalloc.c, vsyscall.lds.S. - * - * Leave one empty page between vmalloc'ed areas and - * the start of the fixmap. - */ -extern unsigned long __FIXADDR_TOP; -#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) -#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) - -#ifndef __ASSEMBLY__ -#include -#include -#include -#include -#include -#include - -/* - * Here we define all the compile-time 'special' virtual - * addresses. The point is to have a constant address at - * compile time, but to set the physical address only - * in the boot process. We allocate these special addresses - * from the end of virtual memory (0xfffff000) backwards. - * Also this lets us do fail-safe vmalloc(), we - * can guarantee that these special addresses and - * vmalloc()-ed addresses never overlap. - * - * these 'compile-time allocated' memory buffers are - * fixed-size 4k pages. (or larger if used with an increment - * highger than 1) use fixmap_set(idx,phys) to associate - * physical memory with fixmap indices. - * - * TLB entries of such buffers will not be flushed across - * task switches. - */ -enum fixed_addresses { - FIX_HOLE, - FIX_VDSO, - FIX_DBGP_BASE, - FIX_EARLYCON_MEM_BASE, -#ifdef CONFIG_X86_LOCAL_APIC - FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */ -#endif -#ifdef CONFIG_X86_IO_APIC - FIX_IO_APIC_BASE_0, - FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1, -#endif -#ifdef CONFIG_X86_VISWS_APIC - FIX_CO_CPU, /* Cobalt timer */ - FIX_CO_APIC, /* Cobalt APIC Redirection Table */ - FIX_LI_PCIA, /* Lithium PCI Bridge A */ - FIX_LI_PCIB, /* Lithium PCI Bridge B */ -#endif -#ifdef CONFIG_X86_F00F_BUG - FIX_F00F_IDT, /* Virtual mapping for IDT */ -#endif -#ifdef CONFIG_X86_CYCLONE_TIMER - FIX_CYCLONE_TIMER, /*cyclone timer register*/ -#endif - FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */ - FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, -#ifdef CONFIG_PCI_MMCONFIG - FIX_PCIE_MCFG, -#endif -#ifdef CONFIG_PARAVIRT - FIX_PARAVIRT_BOOTMAP, -#endif - __end_of_permanent_fixed_addresses, - /* - * 256 temporary boot-time mappings, used by early_ioremap(), - * before ioremap() is functional. - * - * We round it up to the next 256 pages boundary so that we - * can have a single pgd entry and a single pte table: - */ -#define NR_FIX_BTMAPS 64 -#define FIX_BTMAPS_SLOTS 4 - FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - - (__end_of_permanent_fixed_addresses & 255), - FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, - FIX_WP_TEST, -#ifdef CONFIG_ACPI - FIX_ACPI_BEGIN, - FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1, -#endif -#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT - FIX_OHCI1394_BASE, -#endif - __end_of_fixed_addresses -}; - -extern void reserve_top_address(unsigned long reserve); - - -#define FIXADDR_TOP ((unsigned long)__FIXADDR_TOP) - -#define __FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) -#define __FIXADDR_BOOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) -#define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE) -#define FIXADDR_BOOT_START (FIXADDR_TOP - __FIXADDR_BOOT_SIZE) - -#endif /* !__ASSEMBLY__ */ -#endif /* _ASM_X86_FIXMAP_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/fixmap_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap_64.h +++ /dev/null @@ -1,79 +0,0 @@ -/* - * fixmap.h: compile-time virtual memory allocation - * - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (C) 1998 Ingo Molnar - */ - -#ifndef _ASM_X86_FIXMAP_64_H -#define _ASM_X86_FIXMAP_64_H - -#include -#include -#include -#include -#include - -/* - * Here we define all the compile-time 'special' virtual - * addresses. The point is to have a constant address at - * compile time, but to set the physical address only - * in the boot process. - * - * These 'compile-time allocated' memory buffers are - * fixed-size 4k pages (or larger if used with an increment - * higher than 1). Use set_fixmap(idx,phys) to associate - * physical memory with fixmap indices. - * - * TLB entries of such buffers will not be flushed across - * task switches. - */ - -enum fixed_addresses { - VSYSCALL_LAST_PAGE, - VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE - + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1, - VSYSCALL_HPET, - FIX_DBGP_BASE, - FIX_EARLYCON_MEM_BASE, - FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */ - FIX_IO_APIC_BASE_0, - FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1, -#ifdef CONFIG_PARAVIRT - FIX_PARAVIRT_BOOTMAP, -#endif - __end_of_permanent_fixed_addresses, -#ifdef CONFIG_ACPI - FIX_ACPI_BEGIN, - FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1, -#endif -#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT - FIX_OHCI1394_BASE, -#endif - /* - * 256 temporary boot-time mappings, used by early_ioremap(), - * before ioremap() is functional. - * - * We round it up to the next 256 pages boundary so that we - * can have a single pgd entry and a single pte table: - */ -#define NR_FIX_BTMAPS 64 -#define FIX_BTMAPS_SLOTS 4 - FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - - (__end_of_permanent_fixed_addresses & 255), - FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, - __end_of_fixed_addresses -}; - -#define FIXADDR_TOP (VSYSCALL_END-PAGE_SIZE) -#define FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) -#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) - -/* Only covers 32bit vsyscalls currently. Need another set for 64bit. */ -#define FIXADDR_USER_START ((unsigned long)VSYSCALL32_VSYSCALL) -#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE) - -#endif /* _ASM_X86_FIXMAP_64_H */ Index: linux-2.6-tip/arch/x86/include/asm/ftrace.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/ftrace.h +++ linux-2.6-tip/arch/x86/include/asm/ftrace.h @@ -28,6 +28,13 @@ #endif +/* FIXME: I don't want to stay hardcoded */ +#ifdef CONFIG_X86_64 +# define FTRACE_SYSCALL_MAX 296 +#else +# define FTRACE_SYSCALL_MAX 333 +#endif + #ifdef CONFIG_FUNCTION_TRACER #define MCOUNT_ADDR ((long)(mcount)) #define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */ @@ -55,29 +62,4 @@ struct dyn_arch_ftrace { #endif /* __ASSEMBLY__ */ #endif /* CONFIG_FUNCTION_TRACER */ -#ifdef CONFIG_FUNCTION_GRAPH_TRACER - -#ifndef __ASSEMBLY__ - -/* - * Stack of return addresses for functions - * of a thread. - * Used in struct thread_info - */ -struct ftrace_ret_stack { - unsigned long ret; - unsigned long func; - unsigned long long calltime; -}; - -/* - * Primary handler of a function return. - * It relays on ftrace_return_to_handler. - * Defined in entry_32/64.S - */ -extern void return_to_handler(void); - -#endif /* __ASSEMBLY__ */ -#endif /* CONFIG_FUNCTION_GRAPH_TRACER */ - #endif /* _ASM_X86_FTRACE_H */ Index: linux-2.6-tip/arch/x86/include/asm/genapic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/genapic.h +++ linux-2.6-tip/arch/x86/include/asm/genapic.h @@ -1,5 +1 @@ -#ifdef CONFIG_X86_32 -# include "genapic_32.h" -#else -# include "genapic_64.h" -#endif +#include Index: linux-2.6-tip/arch/x86/include/asm/genapic_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/genapic_32.h +++ /dev/null @@ -1,148 +0,0 @@ -#ifndef _ASM_X86_GENAPIC_32_H -#define _ASM_X86_GENAPIC_32_H - -#include -#include - -/* - * Generic APIC driver interface. - * - * An straight forward mapping of the APIC related parts of the - * x86 subarchitecture interface to a dynamic object. - * - * This is used by the "generic" x86 subarchitecture. - * - * Copyright 2003 Andi Kleen, SuSE Labs. - */ - -struct mpc_bus; -struct mpc_table; -struct mpc_cpu; - -struct genapic { - char *name; - int (*probe)(void); - - int (*apic_id_registered)(void); - const struct cpumask *(*target_cpus)(void); - int int_delivery_mode; - int int_dest_mode; - int ESR_DISABLE; - int apic_destination_logical; - unsigned long (*check_apicid_used)(physid_mask_t bitmap, int apicid); - unsigned long (*check_apicid_present)(int apicid); - int no_balance_irq; - int no_ioapic_check; - void (*init_apic_ldr)(void); - physid_mask_t (*ioapic_phys_id_map)(physid_mask_t map); - - void (*setup_apic_routing)(void); - int (*multi_timer_check)(int apic, int irq); - int (*apicid_to_node)(int logical_apicid); - int (*cpu_to_logical_apicid)(int cpu); - int (*cpu_present_to_apicid)(int mps_cpu); - physid_mask_t (*apicid_to_cpu_present)(int phys_apicid); - void (*setup_portio_remap)(void); - int (*check_phys_apicid_present)(int boot_cpu_physical_apicid); - void (*enable_apic_mode)(void); - u32 (*phys_pkg_id)(u32 cpuid_apic, int index_msb); - - /* mpparse */ - /* When one of the next two hooks returns 1 the genapic - is switched to this. Essentially they are additional probe - functions. */ - int (*mps_oem_check)(struct mpc_table *mpc, char *oem, - char *productid); - int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); - - unsigned (*get_apic_id)(unsigned long x); - unsigned long apic_id_mask; - unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask); - unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask, - const struct cpumask *andmask); - void (*vector_allocation_domain)(int cpu, struct cpumask *retmask); - -#ifdef CONFIG_SMP - /* ipi */ - void (*send_IPI_mask)(const struct cpumask *mask, int vector); - void (*send_IPI_mask_allbutself)(const struct cpumask *mask, - int vector); - void (*send_IPI_allbutself)(int vector); - void (*send_IPI_all)(int vector); -#endif - int (*wakeup_cpu)(int apicid, unsigned long start_eip); - int trampoline_phys_low; - int trampoline_phys_high; - void (*wait_for_init_deassert)(atomic_t *deassert); - void (*smp_callin_clear_local_apic)(void); - void (*store_NMI_vector)(unsigned short *high, unsigned short *low); - void (*restore_NMI_vector)(unsigned short *high, unsigned short *low); - void (*inquire_remote_apic)(int apicid); -}; - -#define APICFUNC(x) .x = x, - -/* More functions could be probably marked IPIFUNC and save some space - in UP GENERICARCH kernels, but I don't have the nerve right now - to untangle this mess. -AK */ -#ifdef CONFIG_SMP -#define IPIFUNC(x) APICFUNC(x) -#else -#define IPIFUNC(x) -#endif - -#define APIC_INIT(aname, aprobe) \ -{ \ - .name = aname, \ - .probe = aprobe, \ - .int_delivery_mode = INT_DELIVERY_MODE, \ - .int_dest_mode = INT_DEST_MODE, \ - .no_balance_irq = NO_BALANCE_IRQ, \ - .ESR_DISABLE = esr_disable, \ - .apic_destination_logical = APIC_DEST_LOGICAL, \ - APICFUNC(apic_id_registered) \ - APICFUNC(target_cpus) \ - APICFUNC(check_apicid_used) \ - APICFUNC(check_apicid_present) \ - APICFUNC(init_apic_ldr) \ - APICFUNC(ioapic_phys_id_map) \ - APICFUNC(setup_apic_routing) \ - APICFUNC(multi_timer_check) \ - APICFUNC(apicid_to_node) \ - APICFUNC(cpu_to_logical_apicid) \ - APICFUNC(cpu_present_to_apicid) \ - APICFUNC(apicid_to_cpu_present) \ - APICFUNC(setup_portio_remap) \ - APICFUNC(check_phys_apicid_present) \ - APICFUNC(mps_oem_check) \ - APICFUNC(get_apic_id) \ - .apic_id_mask = APIC_ID_MASK, \ - APICFUNC(cpu_mask_to_apicid) \ - APICFUNC(cpu_mask_to_apicid_and) \ - APICFUNC(vector_allocation_domain) \ - APICFUNC(acpi_madt_oem_check) \ - IPIFUNC(send_IPI_mask) \ - IPIFUNC(send_IPI_allbutself) \ - IPIFUNC(send_IPI_all) \ - APICFUNC(enable_apic_mode) \ - APICFUNC(phys_pkg_id) \ - .trampoline_phys_low = TRAMPOLINE_PHYS_LOW, \ - .trampoline_phys_high = TRAMPOLINE_PHYS_HIGH, \ - APICFUNC(wait_for_init_deassert) \ - APICFUNC(smp_callin_clear_local_apic) \ - APICFUNC(store_NMI_vector) \ - APICFUNC(restore_NMI_vector) \ - APICFUNC(inquire_remote_apic) \ -} - -extern struct genapic *genapic; -extern void es7000_update_genapic_to_cluster(void); - -enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC}; -#define get_uv_system_type() UV_NONE -#define is_uv_system() 0 -#define uv_wakeup_secondary(a, b) 1 -#define uv_system_init() do {} while (0) - - -#endif /* _ASM_X86_GENAPIC_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/genapic_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/genapic_64.h +++ /dev/null @@ -1,66 +0,0 @@ -#ifndef _ASM_X86_GENAPIC_64_H -#define _ASM_X86_GENAPIC_64_H - -#include - -/* - * Copyright 2004 James Cleverdon, IBM. - * Subject to the GNU Public License, v.2 - * - * Generic APIC sub-arch data struct. - * - * Hacked for x86-64 by James Cleverdon from i386 architecture code by - * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and - * James Cleverdon. - */ - -struct genapic { - char *name; - int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); - u32 int_delivery_mode; - u32 int_dest_mode; - int (*apic_id_registered)(void); - const struct cpumask *(*target_cpus)(void); - void (*vector_allocation_domain)(int cpu, struct cpumask *retmask); - void (*init_apic_ldr)(void); - /* ipi */ - void (*send_IPI_mask)(const struct cpumask *mask, int vector); - void (*send_IPI_mask_allbutself)(const struct cpumask *mask, - int vector); - void (*send_IPI_allbutself)(int vector); - void (*send_IPI_all)(int vector); - void (*send_IPI_self)(int vector); - /* */ - unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask); - unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask, - const struct cpumask *andmask); - unsigned int (*phys_pkg_id)(int index_msb); - unsigned int (*get_apic_id)(unsigned long x); - unsigned long (*set_apic_id)(unsigned int id); - unsigned long apic_id_mask; - /* wakeup_secondary_cpu */ - int (*wakeup_cpu)(int apicid, unsigned long start_eip); -}; - -extern struct genapic *genapic; - -extern struct genapic apic_flat; -extern struct genapic apic_physflat; -extern struct genapic apic_x2apic_cluster; -extern struct genapic apic_x2apic_phys; -extern int acpi_madt_oem_check(char *, char *); - -extern void apic_send_IPI_self(int vector); -enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC}; -extern enum uv_system_type get_uv_system_type(void); -extern int is_uv_system(void); - -extern struct genapic apic_x2apic_uv_x; -DECLARE_PER_CPU(int, x2apic_extra_bits); -extern void uv_cpu_init(void); -extern void uv_system_init(void); -extern int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip); - -extern void setup_apic_routing(void); - -#endif /* _ASM_X86_GENAPIC_64_H */ Index: linux-2.6-tip/arch/x86/include/asm/hardirq.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/hardirq.h +++ linux-2.6-tip/arch/x86/include/asm/hardirq.h @@ -1,11 +1,54 @@ -#ifdef CONFIG_X86_32 -# include "hardirq_32.h" -#else -# include "hardirq_64.h" +#ifndef _ASM_X86_HARDIRQ_H +#define _ASM_X86_HARDIRQ_H + +#include +#include + +typedef struct { + unsigned int __softirq_pending; + unsigned int __nmi_count; /* arch dependent */ + unsigned int irq0_irqs; +#ifdef CONFIG_X86_LOCAL_APIC + unsigned int apic_timer_irqs; /* arch dependent */ + unsigned int irq_spurious_count; +#endif + unsigned int generic_irqs; /* arch dependent */ + unsigned int apic_perf_irqs; +#ifdef CONFIG_SMP + unsigned int irq_resched_count; + unsigned int irq_call_count; + unsigned int irq_tlb_count; +#endif +#ifdef CONFIG_X86_MCE + unsigned int irq_thermal_count; +# ifdef CONFIG_X86_64 + unsigned int irq_threshold_count; +# endif #endif +} ____cacheline_aligned irq_cpustat_t; + +DECLARE_PER_CPU(irq_cpustat_t, irq_stat); + +/* We can have at most NR_VECTORS irqs routed to a cpu at a time */ +#define MAX_HARDIRQS_PER_CPU NR_VECTORS + +#define __ARCH_IRQ_STAT + +#define inc_irq_stat(member) percpu_add(irq_stat.member, 1) + +#define local_softirq_pending() percpu_read(irq_stat.__softirq_pending) + +#define __ARCH_SET_SOFTIRQ_PENDING + +#define set_softirq_pending(x) percpu_write(irq_stat.__softirq_pending, (x)) +#define or_softirq_pending(x) percpu_or(irq_stat.__softirq_pending, (x)) + +extern void ack_bad_irq(unsigned int irq); extern u64 arch_irq_stat_cpu(unsigned int cpu); #define arch_irq_stat_cpu arch_irq_stat_cpu extern u64 arch_irq_stat(void); #define arch_irq_stat arch_irq_stat + +#endif /* _ASM_X86_HARDIRQ_H */ Index: linux-2.6-tip/arch/x86/include/asm/hardirq_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/hardirq_32.h +++ /dev/null @@ -1,30 +0,0 @@ -#ifndef _ASM_X86_HARDIRQ_32_H -#define _ASM_X86_HARDIRQ_32_H - -#include -#include - -typedef struct { - unsigned int __softirq_pending; - unsigned long idle_timestamp; - unsigned int __nmi_count; /* arch dependent */ - unsigned int apic_timer_irqs; /* arch dependent */ - unsigned int irq0_irqs; - unsigned int irq_resched_count; - unsigned int irq_call_count; - unsigned int irq_tlb_count; - unsigned int irq_thermal_count; - unsigned int irq_spurious_count; -} ____cacheline_aligned irq_cpustat_t; - -DECLARE_PER_CPU(irq_cpustat_t, irq_stat); - -#define __ARCH_IRQ_STAT -#define __IRQ_STAT(cpu, member) (per_cpu(irq_stat, cpu).member) - -#define inc_irq_stat(member) (__get_cpu_var(irq_stat).member++) - -void ack_bad_irq(unsigned int irq); -#include - -#endif /* _ASM_X86_HARDIRQ_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/hardirq_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/hardirq_64.h +++ /dev/null @@ -1,25 +0,0 @@ -#ifndef _ASM_X86_HARDIRQ_64_H -#define _ASM_X86_HARDIRQ_64_H - -#include -#include -#include -#include - -/* We can have at most NR_VECTORS irqs routed to a cpu at a time */ -#define MAX_HARDIRQS_PER_CPU NR_VECTORS - -#define __ARCH_IRQ_STAT 1 - -#define inc_irq_stat(member) add_pda(member, 1) - -#define local_softirq_pending() read_pda(__softirq_pending) - -#define __ARCH_SET_SOFTIRQ_PENDING 1 - -#define set_softirq_pending(x) write_pda(__softirq_pending, (x)) -#define or_softirq_pending(x) or_pda(__softirq_pending, (x)) - -extern void ack_bad_irq(unsigned int irq); - -#endif /* _ASM_X86_HARDIRQ_64_H */ Index: linux-2.6-tip/arch/x86/include/asm/highmem.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/highmem.h +++ linux-2.6-tip/arch/x86/include/asm/highmem.h @@ -58,15 +58,28 @@ extern void *kmap_high(struct page *page extern void kunmap_high(struct page *page); void *kmap(struct page *page); +extern void kunmap_virt(void *ptr); +extern struct page *kmap_to_page(void *ptr); +void kunmap(struct page *page); + +void *__kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot); +void *__kmap_atomic(struct page *page, enum km_type type); +void *__kmap_atomic_direct(struct page *page, enum km_type type); +void __kunmap_atomic(void *kvaddr, enum km_type type); +void *__kmap_atomic_pfn(unsigned long pfn, enum km_type type); +struct page *__kmap_atomic_to_page(void *ptr); + void kunmap(struct page *page); void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot); void *kmap_atomic(struct page *page, enum km_type type); void kunmap_atomic(void *kvaddr, enum km_type type); void *kmap_atomic_pfn(unsigned long pfn, enum km_type type); +void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot); struct page *kmap_atomic_to_page(void *ptr); #ifndef CONFIG_PARAVIRT -#define kmap_atomic_pte(page, type) kmap_atomic(page, type) +#define kmap_atomic_pte(page, type) kmap_atomic(page, type) +#define kmap_atomic_pte_direct(page, type) kmap_atomic_direct(page, type) #endif #define flush_cache_kmaps() do { } while (0) @@ -74,6 +87,27 @@ struct page *kmap_atomic_to_page(void *p extern void add_highpages_with_active_regions(int nid, unsigned long start_pfn, unsigned long end_pfn); +/* + * on PREEMPT_RT kmap_atomic() is a wrapper that uses kmap(): + */ +#ifdef CONFIG_PREEMPT_RT +# define kmap_atomic_prot(page, type, prot) ({ pagefault_disable(); kmap(page); }) +# define kmap_atomic(page, type) ({ pagefault_disable(); kmap(page); }) +# define kmap_atomic_pfn(pfn, type) kmap(pfn_to_page(pfn)) +# define kunmap_atomic(kvaddr, type) do { pagefault_enable(); kunmap_virt(kvaddr); } while(0) +# define kmap_atomic_to_page(kvaddr) kmap_to_page(kvaddr) +# define kmap_atomic_direct(page, type) __kmap_atomic_direct(page, type) +# define kunmap_atomic_direct(kvaddr, type) __kunmap_atomic(kvaddr, type) +#else +# define kmap_atomic_prot(page, type, prot) __kmap_atomic_prot(page, type, prot) +# define kmap_atomic(page, type) __kmap_atomic(page, type) +# define kmap_atomic_pfn(pfn, type) __kmap_atomic_pfn(pfn, type) +# define kunmap_atomic(kvaddr, type) __kunmap_atomic(kvaddr, type) +# define kmap_atomic_to_page(kvaddr) __kmap_atomic_to_page(kvaddr) +# define kmap_atomic_direct(page, type) __kmap_atomic(page, type) +# define kunmap_atomic_direct(kvaddr, type) __kunmap_atomic(kvaddr, type) +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_X86_HIGHMEM_H */ Index: linux-2.6-tip/arch/x86/include/asm/hw_irq.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/hw_irq.h +++ linux-2.6-tip/arch/x86/include/asm/hw_irq.h @@ -25,11 +25,12 @@ #include #include -#define platform_legacy_irq(irq) ((irq) < 16) - /* Interrupt handlers registered during init_IRQ */ extern void apic_timer_interrupt(void); +extern void generic_interrupt(void); extern void error_interrupt(void); +extern void perf_counter_interrupt(void); + extern void spurious_interrupt(void); extern void thermal_interrupt(void); extern void reschedule_interrupt(void); @@ -58,7 +59,7 @@ extern void make_8259A_irq(unsigned int extern void init_8259A(int aeoi); /* IOAPIC */ -#define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs)) +#define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) extern unsigned long io_apic_irqs; extern void init_VISWS_APIC_irqs(void); @@ -67,15 +68,7 @@ extern void disable_IO_APIC(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn); extern void setup_ioapic_dest(void); -#ifdef CONFIG_X86_64 extern void enable_IO_APIC(void); -#endif - -/* IPI functions */ -#ifdef CONFIG_X86_32 -extern void send_IPI_self(int vector); -#endif -extern void send_IPI(int dest, int vector); /* Statistics */ extern atomic_t irq_err_count; @@ -84,21 +77,11 @@ extern atomic_t irq_mis_count; /* EISA */ extern void eisa_set_level_irq(unsigned int irq); -/* Voyager functions */ -extern asmlinkage void vic_cpi_interrupt(void); -extern asmlinkage void vic_sys_interrupt(void); -extern asmlinkage void vic_cmn_interrupt(void); -extern asmlinkage void qic_timer_interrupt(void); -extern asmlinkage void qic_invalidate_interrupt(void); -extern asmlinkage void qic_reschedule_interrupt(void); -extern asmlinkage void qic_enable_irq_interrupt(void); -extern asmlinkage void qic_call_function_interrupt(void); - /* SMP */ extern void smp_apic_timer_interrupt(struct pt_regs *); extern void smp_spurious_interrupt(struct pt_regs *); extern void smp_error_interrupt(struct pt_regs *); -#ifdef CONFIG_X86_SMP +#ifdef CONFIG_SMP extern void smp_reschedule_interrupt(struct pt_regs *); extern void smp_call_function_interrupt(struct pt_regs *); extern void smp_call_function_single_interrupt(struct pt_regs *); Index: linux-2.6-tip/arch/x86/include/asm/i8259.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/i8259.h +++ linux-2.6-tip/arch/x86/include/asm/i8259.h @@ -24,7 +24,7 @@ extern unsigned int cached_irq_mask; #define SLAVE_ICW4_DEFAULT 0x01 #define PIC_ICW4_AEOI 2 -extern spinlock_t i8259A_lock; +extern raw_spinlock_t i8259A_lock; extern void init_8259A(int auto_eoi); extern void enable_8259A_irq(unsigned int irq); @@ -60,4 +60,8 @@ extern struct irq_chip i8259A_chip; extern void mask_8259A(void); extern void unmask_8259A(void); +#ifdef CONFIG_X86_32 +extern void init_ISA_irqs(void); +#endif + #endif /* _ASM_X86_I8259_H */ Index: linux-2.6-tip/arch/x86/include/asm/init.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/init.h @@ -0,0 +1,18 @@ +#ifndef _ASM_X86_INIT_32_H +#define _ASM_X86_INIT_32_H + +#ifdef CONFIG_X86_32 +extern void __init early_ioremap_page_table_range_init(void); +#endif + +extern unsigned long __init +kernel_physical_mapping_init(unsigned long start, + unsigned long end, + unsigned long page_size_mask); + + +extern unsigned long __initdata e820_table_start; +extern unsigned long __meminitdata e820_table_end; +extern unsigned long __meminitdata e820_table_top; + +#endif /* _ASM_X86_INIT_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/intel_arch_perfmon.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/intel_arch_perfmon.h +++ /dev/null @@ -1,31 +0,0 @@ -#ifndef _ASM_X86_INTEL_ARCH_PERFMON_H -#define _ASM_X86_INTEL_ARCH_PERFMON_H - -#define MSR_ARCH_PERFMON_PERFCTR0 0xc1 -#define MSR_ARCH_PERFMON_PERFCTR1 0xc2 - -#define MSR_ARCH_PERFMON_EVENTSEL0 0x186 -#define MSR_ARCH_PERFMON_EVENTSEL1 0x187 - -#define ARCH_PERFMON_EVENTSEL0_ENABLE (1 << 22) -#define ARCH_PERFMON_EVENTSEL_INT (1 << 20) -#define ARCH_PERFMON_EVENTSEL_OS (1 << 17) -#define ARCH_PERFMON_EVENTSEL_USR (1 << 16) - -#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL (0x3c) -#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8) -#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX (0) -#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \ - (1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX)) - -union cpuid10_eax { - struct { - unsigned int version_id:8; - unsigned int num_counters:8; - unsigned int bit_width:8; - unsigned int mask_length:8; - } split; - unsigned int full; -}; - -#endif /* _ASM_X86_INTEL_ARCH_PERFMON_H */ Index: linux-2.6-tip/arch/x86/include/asm/io.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/io.h +++ linux-2.6-tip/arch/x86/include/asm/io.h @@ -5,6 +5,7 @@ #include #include +#include #define build_mmio_read(name, size, type, reg, barrier) \ static inline type name(const volatile void __iomem *addr) \ @@ -80,6 +81,98 @@ static inline void writeq(__u64 val, vol #define readq readq #define writeq writeq +/** + * virt_to_phys - map virtual addresses to physical + * @address: address to remap + * + * The returned physical address is the physical (CPU) mapping for + * the memory address given. It is only valid to use this function on + * addresses directly mapped or allocated via kmalloc. + * + * This function does not give bus mappings for DMA transfers. In + * almost all conceivable cases a device driver should not be using + * this function + */ + +static inline phys_addr_t virt_to_phys(volatile void *address) +{ + return __pa(address); +} + +/** + * phys_to_virt - map physical address to virtual + * @address: address to remap + * + * The returned virtual address is a current CPU mapping for + * the memory address given. It is only valid to use this function on + * addresses that have a kernel mapping + * + * This function does not handle bus mappings for DMA transfers. In + * almost all conceivable cases a device driver should not be using + * this function + */ + +static inline void *phys_to_virt(phys_addr_t address) +{ + return __va(address); +} + +/* + * Change "struct page" to physical address. + */ +#define page_to_phys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT) + +/* + * ISA I/O bus memory addresses are 1:1 with the physical address. + * However, we truncate the address to unsigned int to avoid undesirable + * promitions in legacy drivers. + */ +static inline unsigned int isa_virt_to_bus(volatile void *address) +{ + return (unsigned int)virt_to_phys(address); +} +#define isa_page_to_bus(page) ((unsigned int)page_to_phys(page)) +#define isa_bus_to_virt phys_to_virt + +/* + * However PCI ones are not necessarily 1:1 and therefore these interfaces + * are forbidden in portable PCI drivers. + * + * Allow them on x86 for legacy drivers, though. + */ +#define virt_to_bus virt_to_phys +#define bus_to_virt phys_to_virt + +/** + * ioremap - map bus memory into CPU space + * @offset: bus address of the memory + * @size: size of the resource to map + * + * ioremap performs a platform specific sequence of operations to + * make bus memory CPU accessible via the readb/readw/readl/writeb/ + * writew/writel functions and the other mmio helpers. The returned + * address is not guaranteed to be usable directly as a virtual + * address. + * + * If the area you are trying to map is a PCI BAR you should have a + * look at pci_iomap(). + */ +extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); +extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); +extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, + unsigned long prot_val); + +/* + * The default ioremap() behavior is non-cached: + */ +static inline void __iomem *ioremap(resource_size_t offset, unsigned long size) +{ + return ioremap_nocache(offset, size); +} + +extern void iounmap(volatile void __iomem *addr); + + #ifdef CONFIG_X86_32 # include "io_32.h" #else @@ -91,7 +184,7 @@ extern void unxlate_dev_mem_ptr(unsigned extern int ioremap_change_attr(unsigned long vaddr, unsigned long size, unsigned long prot_val); -extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size); +extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size); /* * early_ioremap() and early_iounmap() are for temporary early boot-time @@ -100,10 +193,12 @@ extern void __iomem *ioremap_wc(unsigned */ extern void early_ioremap_init(void); extern void early_ioremap_reset(void); -extern void __iomem *early_ioremap(unsigned long offset, unsigned long size); -extern void __iomem *early_memremap(unsigned long offset, unsigned long size); +extern void __iomem *early_ioremap(resource_size_t phys_addr, + unsigned long size); +extern void __iomem *early_memremap(resource_size_t phys_addr, + unsigned long size); extern void early_iounmap(void __iomem *addr, unsigned long size); -extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys); +#define IO_SPACE_LIMIT 0xffff #endif /* _ASM_X86_IO_H */ Index: linux-2.6-tip/arch/x86/include/asm/io_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/io_32.h +++ linux-2.6-tip/arch/x86/include/asm/io_32.h @@ -37,8 +37,6 @@ * - Arnaldo Carvalho de Melo */ -#define IO_SPACE_LIMIT 0xffff - #define XQUAD_PORTIO_BASE 0xfe400000 #define XQUAD_PORTIO_QUAD 0x40000 /* 256k per quad. */ @@ -53,92 +51,6 @@ */ #define xlate_dev_kmem_ptr(p) p -/** - * virt_to_phys - map virtual addresses to physical - * @address: address to remap - * - * The returned physical address is the physical (CPU) mapping for - * the memory address given. It is only valid to use this function on - * addresses directly mapped or allocated via kmalloc. - * - * This function does not give bus mappings for DMA transfers. In - * almost all conceivable cases a device driver should not be using - * this function - */ - -static inline unsigned long virt_to_phys(volatile void *address) -{ - return __pa(address); -} - -/** - * phys_to_virt - map physical address to virtual - * @address: address to remap - * - * The returned virtual address is a current CPU mapping for - * the memory address given. It is only valid to use this function on - * addresses that have a kernel mapping - * - * This function does not handle bus mappings for DMA transfers. In - * almost all conceivable cases a device driver should not be using - * this function - */ - -static inline void *phys_to_virt(unsigned long address) -{ - return __va(address); -} - -/* - * Change "struct page" to physical address. - */ -#define page_to_phys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT) - -/** - * ioremap - map bus memory into CPU space - * @offset: bus address of the memory - * @size: size of the resource to map - * - * ioremap performs a platform specific sequence of operations to - * make bus memory CPU accessible via the readb/readw/readl/writeb/ - * writew/writel functions and the other mmio helpers. The returned - * address is not guaranteed to be usable directly as a virtual - * address. - * - * If the area you are trying to map is a PCI BAR you should have a - * look at pci_iomap(). - */ -extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); -extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); -extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, - unsigned long prot_val); - -/* - * The default ioremap() behavior is non-cached: - */ -static inline void __iomem *ioremap(resource_size_t offset, unsigned long size) -{ - return ioremap_nocache(offset, size); -} - -extern void iounmap(volatile void __iomem *addr); - -/* - * ISA I/O bus memory addresses are 1:1 with the physical address. - */ -#define isa_virt_to_bus virt_to_phys -#define isa_page_to_bus page_to_phys -#define isa_bus_to_virt phys_to_virt - -/* - * However PCI ones are not necessarily 1:1 and therefore these interfaces - * are forbidden in portable PCI drivers. - * - * Allow them on x86 for legacy drivers, though. - */ -#define virt_to_bus virt_to_phys -#define bus_to_virt phys_to_virt - static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count) { Index: linux-2.6-tip/arch/x86/include/asm/io_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/io_64.h +++ linux-2.6-tip/arch/x86/include/asm/io_64.h @@ -136,73 +136,12 @@ __OUTS(b) __OUTS(w) __OUTS(l) -#define IO_SPACE_LIMIT 0xffff - #if defined(__KERNEL__) && defined(__x86_64__) #include -#ifndef __i386__ -/* - * Change virtual addresses to physical addresses and vv. - * These are pretty trivial - */ -static inline unsigned long virt_to_phys(volatile void *address) -{ - return __pa(address); -} - -static inline void *phys_to_virt(unsigned long address) -{ - return __va(address); -} -#endif - -/* - * Change "struct page" to physical address. - */ -#define page_to_phys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT) - #include -/* - * This one maps high address device memory and turns off caching for that area. - * it's useful if some control registers are in such an area and write combining - * or read caching is not desirable: - */ -extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); -extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); -extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, - unsigned long prot_val); - -/* - * The default ioremap() behavior is non-cached: - */ -static inline void __iomem *ioremap(resource_size_t offset, unsigned long size) -{ - return ioremap_nocache(offset, size); -} - -extern void iounmap(volatile void __iomem *addr); - -extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys); - -/* - * ISA I/O bus memory addresses are 1:1 with the physical address. - */ -#define isa_virt_to_bus virt_to_phys -#define isa_page_to_bus page_to_phys -#define isa_bus_to_virt phys_to_virt - -/* - * However PCI ones are not necessarily 1:1 and therefore these interfaces - * are forbidden in portable PCI drivers. - * - * Allow them on x86 for legacy drivers, though. - */ -#define virt_to_bus virt_to_phys -#define bus_to_virt phys_to_virt - void __memcpy_fromio(void *, unsigned long, unsigned); void __memcpy_toio(unsigned long, const void *, unsigned); Index: linux-2.6-tip/arch/x86/include/asm/io_apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/io_apic.h +++ linux-2.6-tip/arch/x86/include/asm/io_apic.h @@ -114,38 +114,16 @@ struct IR_IO_APIC_route_entry { extern int nr_ioapics; extern int nr_ioapic_registers[MAX_IO_APICS]; -/* - * MP-BIOS irq configuration table structures: - */ - #define MP_MAX_IOAPIC_PIN 127 -struct mp_config_ioapic { - unsigned long mp_apicaddr; - unsigned int mp_apicid; - unsigned char mp_type; - unsigned char mp_apicver; - unsigned char mp_flags; -}; - -struct mp_config_intsrc { - unsigned int mp_dstapic; - unsigned char mp_type; - unsigned char mp_irqtype; - unsigned short mp_irqflag; - unsigned char mp_srcbus; - unsigned char mp_srcbusirq; - unsigned char mp_dstirq; -}; - /* I/O APIC entries */ -extern struct mp_config_ioapic mp_ioapics[MAX_IO_APICS]; +extern struct mpc_ioapic mp_ioapics[MAX_IO_APICS]; /* # of MP IRQ source entries */ extern int mp_irq_entries; /* MP IRQ source entries */ -extern struct mp_config_intsrc mp_irqs[MAX_IRQ_SOURCES]; +extern struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES]; /* non-0 if default (table-less) MP configuration */ extern int mpc_default_type; @@ -165,15 +143,6 @@ extern int noioapicreroute; /* 1 if the timer IRQ uses the '8259A Virtual Wire' mode */ extern int timer_through_8259; -static inline void disable_ioapic_setup(void) -{ -#ifdef CONFIG_PCI - noioapicquirk = 1; - noioapicreroute = -1; -#endif - skip_ioapic_setup = 1; -} - /* * If we use the IO-APIC for IRQ routing, disable automatic * assignment of PCI IRQ's. @@ -193,13 +162,20 @@ extern int (*ioapic_renumber_irq)(int io extern void ioapic_init_mappings(void); #ifdef CONFIG_X86_64 -extern int save_mask_IO_APIC_setup(void); +extern int save_IO_APIC_setup(void); +extern void mask_IO_APIC_setup(void); extern void restore_IO_APIC_setup(void); extern void reinit_intr_remapped_IO_APIC(int); #endif extern void probe_nr_irqs_gsi(void); +extern int setup_ioapic_entry(int apic, int irq, + struct IO_APIC_route_entry *entry, + unsigned int destination, int trigger, + int polarity, int vector, int pin); +extern void ioapic_write_entry(int apic, int pin, + struct IO_APIC_route_entry e); #else /* !CONFIG_X86_IO_APIC */ #define io_apic_assign_pci_irqs 0 static const int timer_through_8259 = 0; Index: linux-2.6-tip/arch/x86/include/asm/iommu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/iommu.h +++ linux-2.6-tip/arch/x86/include/asm/iommu.h @@ -3,7 +3,7 @@ extern void pci_iommu_shutdown(void); extern void no_iommu_init(void); -extern struct dma_mapping_ops nommu_dma_ops; +extern struct dma_map_ops nommu_dma_ops; extern int force_iommu, no_iommu; extern int iommu_detected; Index: linux-2.6-tip/arch/x86/include/asm/ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/ipi.h +++ linux-2.6-tip/arch/x86/include/asm/ipi.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_IPI_H #define _ASM_X86_IPI_H +#ifdef CONFIG_X86_LOCAL_APIC + /* * Copyright 2004 James Cleverdon, IBM. * Subject to the GNU Public License, v.2 @@ -55,8 +57,8 @@ static inline void __xapic_wait_icr_idle cpu_relax(); } -static inline void __send_IPI_shortcut(unsigned int shortcut, int vector, - unsigned int dest) +static inline void +__default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest) { /* * Subtle. In the case of the 'never do double writes' workaround @@ -87,8 +89,8 @@ static inline void __send_IPI_shortcut(u * This is used to send an IPI with no shorthand notation (the destination is * specified in bits 56 to 63 of the ICR). */ -static inline void __send_IPI_dest_field(unsigned int mask, int vector, - unsigned int dest) +static inline void + __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest) { unsigned long cfg; @@ -117,41 +119,44 @@ static inline void __send_IPI_dest_field native_apic_mem_write(APIC_ICR, cfg); } -static inline void send_IPI_mask_sequence(const struct cpumask *mask, - int vector) -{ - unsigned long flags; - unsigned long query_cpu; +extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, + int vector); +extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask, + int vector); +extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, + int vector); +extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask, + int vector); + +/* Avoid include hell */ +#define NMI_VECTOR 0x02 - /* - * Hack. The clustered APIC addressing mode doesn't allow us to send - * to an arbitrary mask, so I do a unicast to each CPU instead. - * - mbligh - */ - local_irq_save(flags); - for_each_cpu(query_cpu, mask) { - __send_IPI_dest_field(per_cpu(x86_cpu_to_apicid, query_cpu), - vector, APIC_DEST_PHYSICAL); - } - local_irq_restore(flags); +extern int no_broadcast; + +static inline void __default_local_send_IPI_allbutself(int vector) +{ + if (no_broadcast || vector == NMI_VECTOR) + apic->send_IPI_mask_allbutself(cpu_online_mask, vector); + else + __default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical); } -static inline void send_IPI_mask_allbutself(const struct cpumask *mask, - int vector) +static inline void __default_local_send_IPI_all(int vector) { - unsigned long flags; - unsigned int query_cpu; - unsigned int this_cpu = smp_processor_id(); + if (no_broadcast || vector == NMI_VECTOR) + apic->send_IPI_mask(cpu_online_mask, vector); + else + __default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical); +} - /* See Hack comment above */ +#ifdef CONFIG_X86_32 +extern void default_send_IPI_mask_logical(const struct cpumask *mask, + int vector); +extern void default_send_IPI_allbutself(int vector); +extern void default_send_IPI_all(int vector); +extern void default_send_IPI_self(int vector); +#endif - local_irq_save(flags); - for_each_cpu(query_cpu, mask) - if (query_cpu != this_cpu) - __send_IPI_dest_field( - per_cpu(x86_cpu_to_apicid, query_cpu), - vector, APIC_DEST_PHYSICAL); - local_irq_restore(flags); -} +#endif #endif /* _ASM_X86_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/irq.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq.h +++ linux-2.6-tip/arch/x86/include/asm/irq.h @@ -36,9 +36,12 @@ static inline int irq_canonicalize(int i extern void fixup_irqs(void); #endif -extern unsigned int do_IRQ(struct pt_regs *regs); +extern void (*generic_interrupt_extension)(void); extern void init_IRQ(void); extern void native_init_IRQ(void); +extern bool handle_irq(unsigned irq, struct pt_regs *regs); + +extern unsigned int do_IRQ(struct pt_regs *regs); /* Interrupt vector management */ extern DECLARE_BITMAP(used_vectors, NR_VECTORS); Index: linux-2.6-tip/arch/x86/include/asm/irq_regs.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq_regs.h +++ linux-2.6-tip/arch/x86/include/asm/irq_regs.h @@ -1,5 +1,31 @@ -#ifdef CONFIG_X86_32 -# include "irq_regs_32.h" -#else -# include "irq_regs_64.h" -#endif +/* + * Per-cpu current frame pointer - the location of the last exception frame on + * the stack, stored in the per-cpu area. + * + * Jeremy Fitzhardinge + */ +#ifndef _ASM_X86_IRQ_REGS_H +#define _ASM_X86_IRQ_REGS_H + +#include + +#define ARCH_HAS_OWN_IRQ_REGS + +DECLARE_PER_CPU(struct pt_regs *, irq_regs); + +static inline struct pt_regs *get_irq_regs(void) +{ + return percpu_read(irq_regs); +} + +static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs) +{ + struct pt_regs *old_regs; + + old_regs = get_irq_regs(); + percpu_write(irq_regs, new_regs); + + return old_regs; +} + +#endif /* _ASM_X86_IRQ_REGS_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/irq_regs_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq_regs_32.h +++ /dev/null @@ -1,31 +0,0 @@ -/* - * Per-cpu current frame pointer - the location of the last exception frame on - * the stack, stored in the per-cpu area. - * - * Jeremy Fitzhardinge - */ -#ifndef _ASM_X86_IRQ_REGS_32_H -#define _ASM_X86_IRQ_REGS_32_H - -#include - -#define ARCH_HAS_OWN_IRQ_REGS - -DECLARE_PER_CPU(struct pt_regs *, irq_regs); - -static inline struct pt_regs *get_irq_regs(void) -{ - return x86_read_percpu(irq_regs); -} - -static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs) -{ - struct pt_regs *old_regs; - - old_regs = get_irq_regs(); - x86_write_percpu(irq_regs, new_regs); - - return old_regs; -} - -#endif /* _ASM_X86_IRQ_REGS_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/irq_regs_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq_regs_64.h +++ /dev/null @@ -1 +0,0 @@ -#include Index: linux-2.6-tip/arch/x86/include/asm/irq_remapping.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq_remapping.h +++ linux-2.6-tip/arch/x86/include/asm/irq_remapping.h @@ -1,8 +1,6 @@ #ifndef _ASM_X86_IRQ_REMAPPING_H #define _ASM_X86_IRQ_REMAPPING_H -extern int x2apic; - #define IRTE_DEST(dest) ((x2apic) ? dest : dest << 8) #endif /* _ASM_X86_IRQ_REMAPPING_H */ Index: linux-2.6-tip/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6-tip/arch/x86/include/asm/irq_vectors.h @@ -1,47 +1,69 @@ #ifndef _ASM_X86_IRQ_VECTORS_H #define _ASM_X86_IRQ_VECTORS_H -#include +/* + * Linux IRQ vector layout. + * + * There are 256 IDT entries (per CPU - each entry is 8 bytes) which can + * be defined by Linux. They are used as a jump table by the CPU when a + * given vector is triggered - by a CPU-external, CPU-internal or + * software-triggered event. + * + * Linux sets the kernel code address each entry jumps to early during + * bootup, and never changes them. This is the general layout of the + * IDT entries: + * + * Vectors 0 ... 31 : system traps and exceptions - hardcoded events + * Vectors 32 ... 127 : device interrupts + * Vector 128 : legacy int80 syscall interface + * Vectors 129 ... 237 : device interrupts + * Vectors 238 ... 255 : special interrupts + * + * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table. + * + * This file enumerates the exact layout of them: + */ -#define NMI_VECTOR 0x02 +#define NMI_VECTOR 0x02 /* * IDT vectors usable for external interrupt sources start * at 0x20: */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x20 #ifdef CONFIG_X86_32 -# define SYSCALL_VECTOR 0x80 +# define SYSCALL_VECTOR 0x80 #else -# define IA32_SYSCALL_VECTOR 0x80 +# define IA32_SYSCALL_VECTOR 0x80 #endif /* * Reserve the lowest usable priority level 0x20 - 0x2f for triggering * cleanup after irq migration. */ -#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR +#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR /* * Vectors 0x30-0x3f are used for ISA interrupts. */ -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) -#define IRQ1_VECTOR (IRQ0_VECTOR + 1) -#define IRQ2_VECTOR (IRQ0_VECTOR + 2) -#define IRQ3_VECTOR (IRQ0_VECTOR + 3) -#define IRQ4_VECTOR (IRQ0_VECTOR + 4) -#define IRQ5_VECTOR (IRQ0_VECTOR + 5) -#define IRQ6_VECTOR (IRQ0_VECTOR + 6) -#define IRQ7_VECTOR (IRQ0_VECTOR + 7) -#define IRQ8_VECTOR (IRQ0_VECTOR + 8) -#define IRQ9_VECTOR (IRQ0_VECTOR + 9) -#define IRQ10_VECTOR (IRQ0_VECTOR + 10) -#define IRQ11_VECTOR (IRQ0_VECTOR + 11) -#define IRQ12_VECTOR (IRQ0_VECTOR + 12) -#define IRQ13_VECTOR (IRQ0_VECTOR + 13) -#define IRQ14_VECTOR (IRQ0_VECTOR + 14) -#define IRQ15_VECTOR (IRQ0_VECTOR + 15) +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) + +#define IRQ1_VECTOR (IRQ0_VECTOR + 1) +#define IRQ2_VECTOR (IRQ0_VECTOR + 2) +#define IRQ3_VECTOR (IRQ0_VECTOR + 3) +#define IRQ4_VECTOR (IRQ0_VECTOR + 4) +#define IRQ5_VECTOR (IRQ0_VECTOR + 5) +#define IRQ6_VECTOR (IRQ0_VECTOR + 6) +#define IRQ7_VECTOR (IRQ0_VECTOR + 7) +#define IRQ8_VECTOR (IRQ0_VECTOR + 8) +#define IRQ9_VECTOR (IRQ0_VECTOR + 9) +#define IRQ10_VECTOR (IRQ0_VECTOR + 10) +#define IRQ11_VECTOR (IRQ0_VECTOR + 11) +#define IRQ12_VECTOR (IRQ0_VECTOR + 12) +#define IRQ13_VECTOR (IRQ0_VECTOR + 13) +#define IRQ14_VECTOR (IRQ0_VECTOR + 14) +#define IRQ15_VECTOR (IRQ0_VECTOR + 15) /* * Special IRQ vectors used by the SMP architecture, 0xf0-0xff @@ -49,119 +71,103 @@ * some of the following vectors are 'rare', they are merged * into a single vector (CALL_FUNCTION_VECTOR) to save vector space. * TLB, reschedule and local APIC vectors are performance-critical. - * - * Vectors 0xf0-0xfa are free (reserved for future Linux use). */ -#ifdef CONFIG_X86_32 - -# define SPURIOUS_APIC_VECTOR 0xff -# define ERROR_APIC_VECTOR 0xfe -# define INVALIDATE_TLB_VECTOR 0xfd -# define RESCHEDULE_VECTOR 0xfc -# define CALL_FUNCTION_VECTOR 0xfb -# define CALL_FUNCTION_SINGLE_VECTOR 0xfa -# define THERMAL_APIC_VECTOR 0xf0 - -#else #define SPURIOUS_APIC_VECTOR 0xff +/* + * Sanity check + */ +#if ((SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F) +# error SPURIOUS_APIC_VECTOR definition error +#endif + #define ERROR_APIC_VECTOR 0xfe #define RESCHEDULE_VECTOR 0xfd #define CALL_FUNCTION_VECTOR 0xfc #define CALL_FUNCTION_SINGLE_VECTOR 0xfb #define THERMAL_APIC_VECTOR 0xfa -#define THRESHOLD_APIC_VECTOR 0xf9 -#define UV_BAU_MESSAGE 0xf8 -#define INVALIDATE_TLB_VECTOR_END 0xf7 -#define INVALIDATE_TLB_VECTOR_START 0xf0 /* f0-f7 used for TLB flush */ - -#define NUM_INVALIDATE_TLB_VECTORS 8 +#ifdef CONFIG_X86_32 +/* 0xf8 - 0xf9 : free */ +#else +# define THRESHOLD_APIC_VECTOR 0xf9 +# define UV_BAU_MESSAGE 0xf8 #endif +/* f0-f7 used for spreading out TLB flushes: */ +#define INVALIDATE_TLB_VECTOR_END 0xf7 +#define INVALIDATE_TLB_VECTOR_START 0xf0 +#define NUM_INVALIDATE_TLB_VECTORS 8 + /* * Local APIC timer IRQ vector is on a different priority level, * to work around the 'lost local interrupt if more than 2 IRQ * sources per level' errata. */ -#define LOCAL_TIMER_VECTOR 0xef +#define LOCAL_TIMER_VECTOR 0xef + +/* + * Performance monitoring interrupt vector: + */ +#define LOCAL_PERF_VECTOR 0xee + +/* + * Generic system vector for platform specific use + */ +#define GENERIC_INTERRUPT_VECTOR 0xed /* * First APIC vector available to drivers: (vectors 0x30-0xee) we * start at 0x31(0x41) to spread out vectors evenly between priority * levels. (0x80 is the syscall vector) */ -#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) -#define NR_VECTORS 256 +#define NR_VECTORS 256 -#define FPU_IRQ 13 +#define FPU_IRQ 13 -#define FIRST_VM86_IRQ 3 -#define LAST_VM86_IRQ 15 -#define invalid_vm86_irq(irq) ((irq) < 3 || (irq) > 15) +#define FIRST_VM86_IRQ 3 +#define LAST_VM86_IRQ 15 -#define NR_IRQS_LEGACY 16 - -#if defined(CONFIG_X86_IO_APIC) && !defined(CONFIG_X86_VOYAGER) - -#ifndef CONFIG_SPARSE_IRQ -# if NR_CPUS < MAX_IO_APICS -# define NR_IRQS (NR_VECTORS + (32 * NR_CPUS)) -# else -# define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS)) -# endif -#else -# if (8 * NR_CPUS) > (32 * MAX_IO_APICS) -# define NR_IRQS (NR_VECTORS + (8 * NR_CPUS)) -# else -# define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS)) -# endif +#ifndef __ASSEMBLY__ +static inline int invalid_vm86_irq(int irq) +{ + return irq < FIRST_VM86_IRQ || irq > LAST_VM86_IRQ; +} #endif -#elif defined(CONFIG_X86_VOYAGER) - -# define NR_IRQS 224 +/* + * Size the maximum number of interrupts. + * + * If the irq_desc[] array has a sparse layout, we can size things + * generously - it scales up linearly with the maximum number of CPUs, + * and the maximum number of IO-APICs, whichever is higher. + * + * In other cases we size more conservatively, to not create too large + * static arrays. + */ -#else /* IO_APIC || VOYAGER */ +#define NR_IRQS_LEGACY 16 -# define NR_IRQS 16 +#define CPU_VECTOR_LIMIT ( 8 * NR_CPUS ) +#define IO_APIC_VECTOR_LIMIT ( 32 * MAX_IO_APICS ) +#ifdef CONFIG_X86_IO_APIC +# ifdef CONFIG_SPARSE_IRQ +# define NR_IRQS \ + (CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ? \ + (NR_VECTORS + CPU_VECTOR_LIMIT) : \ + (NR_VECTORS + IO_APIC_VECTOR_LIMIT)) +# else +# if NR_CPUS < MAX_IO_APICS +# define NR_IRQS (NR_VECTORS + 4*CPU_VECTOR_LIMIT) +# else +# define NR_IRQS (NR_VECTORS + IO_APIC_VECTOR_LIMIT) +# endif +# endif +#else /* !CONFIG_X86_IO_APIC: */ +# define NR_IRQS NR_IRQS_LEGACY #endif -/* Voyager specific defines */ -/* These define the CPIs we use in linux */ -#define VIC_CPI_LEVEL0 0 -#define VIC_CPI_LEVEL1 1 -/* now the fake CPIs */ -#define VIC_TIMER_CPI 2 -#define VIC_INVALIDATE_CPI 3 -#define VIC_RESCHEDULE_CPI 4 -#define VIC_ENABLE_IRQ_CPI 5 -#define VIC_CALL_FUNCTION_CPI 6 -#define VIC_CALL_FUNCTION_SINGLE_CPI 7 - -/* Now the QIC CPIs: Since we don't need the two initial levels, - * these are 2 less than the VIC CPIs */ -#define QIC_CPI_OFFSET 1 -#define QIC_TIMER_CPI (VIC_TIMER_CPI - QIC_CPI_OFFSET) -#define QIC_INVALIDATE_CPI (VIC_INVALIDATE_CPI - QIC_CPI_OFFSET) -#define QIC_RESCHEDULE_CPI (VIC_RESCHEDULE_CPI - QIC_CPI_OFFSET) -#define QIC_ENABLE_IRQ_CPI (VIC_ENABLE_IRQ_CPI - QIC_CPI_OFFSET) -#define QIC_CALL_FUNCTION_CPI (VIC_CALL_FUNCTION_CPI - QIC_CPI_OFFSET) -#define QIC_CALL_FUNCTION_SINGLE_CPI (VIC_CALL_FUNCTION_SINGLE_CPI - QIC_CPI_OFFSET) - -#define VIC_START_FAKE_CPI VIC_TIMER_CPI -#define VIC_END_FAKE_CPI VIC_CALL_FUNCTION_SINGLE_CPI - -/* this is the SYS_INT CPI. */ -#define VIC_SYS_INT 8 -#define VIC_CMN_INT 15 - -/* This is the boot CPI for alternate processors. It gets overwritten - * by the above once the system has activated all available processors */ -#define VIC_CPU_BOOT_CPI VIC_CPI_LEVEL0 -#define VIC_CPU_BOOT_ERRATA_CPI (VIC_CPI_LEVEL0 + 8) - - #endif /* _ASM_X86_IRQ_VECTORS_H */ Index: linux-2.6-tip/arch/x86/include/asm/kexec.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/kexec.h +++ linux-2.6-tip/arch/x86/include/asm/kexec.h @@ -10,27 +10,12 @@ #else # define PA_CONTROL_PAGE 0 # define VA_CONTROL_PAGE 1 -# define PA_PGD 2 -# define VA_PGD 3 -# define PA_PUD_0 4 -# define VA_PUD_0 5 -# define PA_PMD_0 6 -# define VA_PMD_0 7 -# define PA_PTE_0 8 -# define VA_PTE_0 9 -# define PA_PUD_1 10 -# define VA_PUD_1 11 -# define PA_PMD_1 12 -# define VA_PMD_1 13 -# define PA_PTE_1 14 -# define VA_PTE_1 15 -# define PA_TABLE_PAGE 16 -# define PAGES_NR 17 +# define PA_TABLE_PAGE 2 +# define PA_SWAP_PAGE 3 +# define PAGES_NR 4 #endif -#ifdef CONFIG_X86_32 # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 -#endif #ifndef __ASSEMBLY__ @@ -151,15 +136,16 @@ relocate_kernel(unsigned long indirectio unsigned int has_pae, unsigned int preserve_context); #else -NORET_TYPE void +unsigned long relocate_kernel(unsigned long indirection_page, unsigned long page_list, - unsigned long start_address) ATTRIB_NORET; + unsigned long start_address, + unsigned int preserve_context); #endif -#ifdef CONFIG_X86_32 #define ARCH_HAS_KIMAGE_ARCH +#ifdef CONFIG_X86_32 struct kimage_arch { pgd_t *pgd; #ifdef CONFIG_X86_PAE @@ -169,6 +155,12 @@ struct kimage_arch { pte_t *pte0; pte_t *pte1; }; +#else +struct kimage_arch { + pud_t *pud; + pmd_t *pmd; + pte_t *pte; +}; #endif #endif /* __ASSEMBLY__ */ Index: linux-2.6-tip/arch/x86/include/asm/kmemcheck.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/kmemcheck.h @@ -0,0 +1,42 @@ +#ifndef ASM_X86_KMEMCHECK_H +#define ASM_X86_KMEMCHECK_H + +#include +#include + +#ifdef CONFIG_KMEMCHECK +bool kmemcheck_active(struct pt_regs *regs); + +void kmemcheck_show(struct pt_regs *regs); +void kmemcheck_hide(struct pt_regs *regs); + +bool kmemcheck_fault(struct pt_regs *regs, + unsigned long address, unsigned long error_code); +bool kmemcheck_trap(struct pt_regs *regs); +#else +static inline bool kmemcheck_active(struct pt_regs *regs) +{ + return false; +} + +static inline void kmemcheck_show(struct pt_regs *regs) +{ +} + +static inline void kmemcheck_hide(struct pt_regs *regs) +{ +} + +static inline bool kmemcheck_fault(struct pt_regs *regs, + unsigned long address, unsigned long error_code) +{ + return false; +} + +static inline bool kmemcheck_trap(struct pt_regs *regs) +{ + return false; +} +#endif /* CONFIG_KMEMCHECK */ + +#endif Index: linux-2.6-tip/arch/x86/include/asm/linkage.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/linkage.h +++ linux-2.6-tip/arch/x86/include/asm/linkage.h @@ -1,14 +1,11 @@ #ifndef _ASM_X86_LINKAGE_H #define _ASM_X86_LINKAGE_H +#include + #undef notrace #define notrace __attribute__((no_instrument_function)) -#ifdef CONFIG_X86_64 -#define __ALIGN .p2align 4,,15 -#define __ALIGN_STR ".p2align 4,,15" -#endif - #ifdef CONFIG_X86_32 #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0))) /* @@ -50,72 +47,20 @@ __asmlinkage_protect_n(ret, "g" (arg1), "g" (arg2), "g" (arg3), \ "g" (arg4), "g" (arg5), "g" (arg6)) -#endif +#endif /* CONFIG_X86_32 */ -#ifdef CONFIG_X86_ALIGNMENT_16 -#define __ALIGN .align 16,0x90 -#define __ALIGN_STR ".align 16,0x90" -#endif +#ifdef __ASSEMBLY__ -/* - * to check ENTRY_X86/END_X86 and - * KPROBE_ENTRY_X86/KPROBE_END_X86 - * unbalanced-missed-mixed appearance - */ -#define __set_entry_x86 .set ENTRY_X86_IN, 0 -#define __unset_entry_x86 .set ENTRY_X86_IN, 1 -#define __set_kprobe_x86 .set KPROBE_X86_IN, 0 -#define __unset_kprobe_x86 .set KPROBE_X86_IN, 1 - -#define __macro_err_x86 .error "ENTRY_X86/KPROBE_X86 unbalanced,missed,mixed" - -#define __check_entry_x86 \ - .ifdef ENTRY_X86_IN; \ - .ifeq ENTRY_X86_IN; \ - __macro_err_x86; \ - .abort; \ - .endif; \ - .endif - -#define __check_kprobe_x86 \ - .ifdef KPROBE_X86_IN; \ - .ifeq KPROBE_X86_IN; \ - __macro_err_x86; \ - .abort; \ - .endif; \ - .endif - -#define __check_entry_kprobe_x86 \ - __check_entry_x86; \ - __check_kprobe_x86 - -#define ENTRY_KPROBE_FINAL_X86 __check_entry_kprobe_x86 - -#define ENTRY_X86(name) \ - __check_entry_kprobe_x86; \ - __set_entry_x86; \ - .globl name; \ - __ALIGN; \ +#define GLOBAL(name) \ + .globl name; \ name: -#define END_X86(name) \ - __unset_entry_x86; \ - __check_entry_kprobe_x86; \ - .size name, .-name - -#define KPROBE_ENTRY_X86(name) \ - __check_entry_kprobe_x86; \ - __set_kprobe_x86; \ - .pushsection .kprobes.text, "ax"; \ - .globl name; \ - __ALIGN; \ - name: +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16) +#define __ALIGN .p2align 4, 0x90 +#define __ALIGN_STR __stringify(__ALIGN) +#endif -#define KPROBE_END_X86(name) \ - __unset_kprobe_x86; \ - __check_entry_kprobe_x86; \ - .size name, .-name; \ - .popsection +#endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_LINKAGE_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/apm.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/apm.h +++ /dev/null @@ -1,73 +0,0 @@ -/* - * Machine specific APM BIOS functions for generic. - * Split out from apm.c by Osamu Tomita - */ - -#ifndef _ASM_X86_MACH_DEFAULT_APM_H -#define _ASM_X86_MACH_DEFAULT_APM_H - -#ifdef APM_ZERO_SEGS -# define APM_DO_ZERO_SEGS \ - "pushl %%ds\n\t" \ - "pushl %%es\n\t" \ - "xorl %%edx, %%edx\n\t" \ - "mov %%dx, %%ds\n\t" \ - "mov %%dx, %%es\n\t" \ - "mov %%dx, %%fs\n\t" \ - "mov %%dx, %%gs\n\t" -# define APM_DO_POP_SEGS \ - "popl %%es\n\t" \ - "popl %%ds\n\t" -#else -# define APM_DO_ZERO_SEGS -# define APM_DO_POP_SEGS -#endif - -static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, - u32 *eax, u32 *ebx, u32 *ecx, - u32 *edx, u32 *esi) -{ - /* - * N.B. We do NOT need a cld after the BIOS call - * because we always save and restore the flags. - */ - __asm__ __volatile__(APM_DO_ZERO_SEGS - "pushl %%edi\n\t" - "pushl %%ebp\n\t" - "lcall *%%cs:apm_bios_entry\n\t" - "setc %%al\n\t" - "popl %%ebp\n\t" - "popl %%edi\n\t" - APM_DO_POP_SEGS - : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx), - "=S" (*esi) - : "a" (func), "b" (ebx_in), "c" (ecx_in) - : "memory", "cc"); -} - -static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in, - u32 ecx_in, u32 *eax) -{ - int cx, dx, si; - u8 error; - - /* - * N.B. We do NOT need a cld after the BIOS call - * because we always save and restore the flags. - */ - __asm__ __volatile__(APM_DO_ZERO_SEGS - "pushl %%edi\n\t" - "pushl %%ebp\n\t" - "lcall *%%cs:apm_bios_entry\n\t" - "setc %%bl\n\t" - "popl %%ebp\n\t" - "popl %%edi\n\t" - APM_DO_POP_SEGS - : "=a" (*eax), "=b" (error), "=c" (cx), "=d" (dx), - "=S" (si) - : "a" (func), "b" (ebx_in), "c" (ecx_in) - : "memory", "cc"); - return error; -} - -#endif /* _ASM_X86_MACH_DEFAULT_APM_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/do_timer.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/do_timer.h +++ /dev/null @@ -1,16 +0,0 @@ -/* defines for inline arch setup functions */ -#include - -#include -#include - -/** - * do_timer_interrupt_hook - hook into timer tick - * - * Call the pit clock event handler. see asm/i8253.h - **/ - -static inline void do_timer_interrupt_hook(void) -{ - global_clock_event->event_handler(global_clock_event); -} Index: linux-2.6-tip/arch/x86/include/asm/mach-default/entry_arch.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/entry_arch.h +++ /dev/null @@ -1,36 +0,0 @@ -/* - * This file is designed to contain the BUILD_INTERRUPT specifications for - * all of the extra named interrupt vectors used by the architecture. - * Usually this is the Inter Process Interrupts (IPIs) - */ - -/* - * The following vectors are part of the Linux architecture, there - * is no hardware IRQ pin equivalent for them, they are triggered - * through the ICC by us (IPIs) - */ -#ifdef CONFIG_X86_SMP -BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR) -BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR) -BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR) -BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR) -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR) -#endif - -/* - * every pentium local APIC has two 'local interrupts', with a - * soft-definable vector attached to both interrupts, one of - * which is a timer interrupt, the other one is error counter - * overflow. Linux uses the local APIC timer interrupt to get - * a much simpler SMP time architecture: - */ -#ifdef CONFIG_X86_LOCAL_APIC -BUILD_INTERRUPT(apic_timer_interrupt,LOCAL_TIMER_VECTOR) -BUILD_INTERRUPT(error_interrupt,ERROR_APIC_VECTOR) -BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR) - -#ifdef CONFIG_X86_MCE_P4THERMAL -BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR) -#endif - -#endif Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_apic.h +++ /dev/null @@ -1,168 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_APIC_H -#define _ASM_X86_MACH_DEFAULT_MACH_APIC_H - -#ifdef CONFIG_X86_LOCAL_APIC - -#include -#include - -#define APIC_DFR_VALUE (APIC_DFR_FLAT) - -static inline const struct cpumask *target_cpus(void) -{ -#ifdef CONFIG_SMP - return cpu_online_mask; -#else - return cpumask_of(0); -#endif -} - -#define NO_BALANCE_IRQ (0) -#define esr_disable (0) - -#ifdef CONFIG_X86_64 -#include -#define INT_DELIVERY_MODE (genapic->int_delivery_mode) -#define INT_DEST_MODE (genapic->int_dest_mode) -#define TARGET_CPUS (genapic->target_cpus()) -#define apic_id_registered (genapic->apic_id_registered) -#define init_apic_ldr (genapic->init_apic_ldr) -#define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid) -#define cpu_mask_to_apicid_and (genapic->cpu_mask_to_apicid_and) -#define phys_pkg_id (genapic->phys_pkg_id) -#define vector_allocation_domain (genapic->vector_allocation_domain) -#define read_apic_id() (GET_APIC_ID(apic_read(APIC_ID))) -#define send_IPI_self (genapic->send_IPI_self) -#define wakeup_secondary_cpu (genapic->wakeup_cpu) -extern void setup_apic_routing(void); -#else -#define INT_DELIVERY_MODE dest_LowestPrio -#define INT_DEST_MODE 1 /* logical delivery broadcast to all procs */ -#define TARGET_CPUS (target_cpus()) -#define wakeup_secondary_cpu wakeup_secondary_cpu_via_init -/* - * Set up the logical destination ID. - * - * Intel recommends to set DFR, LDR and TPR before enabling - * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel - * document number 292116). So here it goes... - */ -static inline void init_apic_ldr(void) -{ - unsigned long val; - - apic_write(APIC_DFR, APIC_DFR_VALUE); - val = apic_read(APIC_LDR) & ~APIC_LDR_MASK; - val |= SET_APIC_LOGICAL_ID(1UL << smp_processor_id()); - apic_write(APIC_LDR, val); -} - -static inline int apic_id_registered(void) -{ - return physid_isset(read_apic_id(), phys_cpu_present_map); -} - -static inline unsigned int cpu_mask_to_apicid(const struct cpumask *cpumask) -{ - return cpumask_bits(cpumask)[0]; -} - -static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask, - const struct cpumask *andmask) -{ - unsigned long mask1 = cpumask_bits(cpumask)[0]; - unsigned long mask2 = cpumask_bits(andmask)[0]; - unsigned long mask3 = cpumask_bits(cpu_online_mask)[0]; - - return (unsigned int)(mask1 & mask2 & mask3); -} - -static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb) -{ - return cpuid_apic >> index_msb; -} - -static inline void setup_apic_routing(void) -{ -#ifdef CONFIG_X86_IO_APIC - printk("Enabling APIC mode: %s. Using %d I/O APICs\n", - "Flat", nr_ioapics); -#endif -} - -static inline int apicid_to_node(int logical_apicid) -{ -#ifdef CONFIG_SMP - return apicid_2_node[hard_smp_processor_id()]; -#else - return 0; -#endif -} - -static inline void vector_allocation_domain(int cpu, struct cpumask *retmask) -{ - /* Careful. Some cpus do not strictly honor the set of cpus - * specified in the interrupt destination when using lowest - * priority interrupt delivery mode. - * - * In particular there was a hyperthreading cpu observed to - * deliver interrupts to the wrong hyperthread when only one - * hyperthread was specified in the interrupt desitination. - */ - *retmask = (cpumask_t) { { [0] = APIC_ALL_CPUS } }; -} -#endif - -static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid) -{ - return physid_isset(apicid, bitmap); -} - -static inline unsigned long check_apicid_present(int bit) -{ - return physid_isset(bit, phys_cpu_present_map); -} - -static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map) -{ - return phys_map; -} - -static inline int multi_timer_check(int apic, int irq) -{ - return 0; -} - -/* Mapping from cpu number to logical apicid */ -static inline int cpu_to_logical_apicid(int cpu) -{ - return 1 << cpu; -} - -static inline int cpu_present_to_apicid(int mps_cpu) -{ - if (mps_cpu < nr_cpu_ids && cpu_present(mps_cpu)) - return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu); - else - return BAD_APICID; -} - -static inline physid_mask_t apicid_to_cpu_present(int phys_apicid) -{ - return physid_mask_of_physid(phys_apicid); -} - -static inline void setup_portio_remap(void) -{ -} - -static inline int check_phys_apicid_present(int boot_cpu_physical_apicid) -{ - return physid_isset(boot_cpu_physical_apicid, phys_cpu_present_map); -} - -static inline void enable_apic_mode(void) -{ -} -#endif /* CONFIG_X86_LOCAL_APIC */ -#endif /* _ASM_X86_MACH_DEFAULT_MACH_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_apicdef.h +++ /dev/null @@ -1,24 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H -#define _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H - -#include - -#ifdef CONFIG_X86_64 -#define APIC_ID_MASK (genapic->apic_id_mask) -#define GET_APIC_ID(x) (genapic->get_apic_id(x)) -#define SET_APIC_ID(x) (genapic->set_apic_id(x)) -#else -#define APIC_ID_MASK (0xF<<24) -static inline unsigned get_apic_id(unsigned long x) -{ - unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR)); - if (APIC_XAPIC(ver)) - return (((x)>>24)&0xFF); - else - return (((x)>>24)&0xF); -} - -#define GET_APIC_ID(x) get_apic_id(x) -#endif - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_ipi.h +++ /dev/null @@ -1,64 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_IPI_H -#define _ASM_X86_MACH_DEFAULT_MACH_IPI_H - -/* Avoid include hell */ -#define NMI_VECTOR 0x02 - -void send_IPI_mask_bitmask(const struct cpumask *mask, int vector); -void send_IPI_mask_allbutself(const struct cpumask *mask, int vector); -void __send_IPI_shortcut(unsigned int shortcut, int vector); - -extern int no_broadcast; - -#ifdef CONFIG_X86_64 -#include -#define send_IPI_mask (genapic->send_IPI_mask) -#define send_IPI_mask_allbutself (genapic->send_IPI_mask_allbutself) -#else -static inline void send_IPI_mask(const struct cpumask *mask, int vector) -{ - send_IPI_mask_bitmask(mask, vector); -} -void send_IPI_mask_allbutself(const struct cpumask *mask, int vector); -#endif - -static inline void __local_send_IPI_allbutself(int vector) -{ - if (no_broadcast || vector == NMI_VECTOR) - send_IPI_mask_allbutself(cpu_online_mask, vector); - else - __send_IPI_shortcut(APIC_DEST_ALLBUT, vector); -} - -static inline void __local_send_IPI_all(int vector) -{ - if (no_broadcast || vector == NMI_VECTOR) - send_IPI_mask(cpu_online_mask, vector); - else - __send_IPI_shortcut(APIC_DEST_ALLINC, vector); -} - -#ifdef CONFIG_X86_64 -#define send_IPI_allbutself (genapic->send_IPI_allbutself) -#define send_IPI_all (genapic->send_IPI_all) -#else -static inline void send_IPI_allbutself(int vector) -{ - /* - * if there are no other CPUs in the system then we get an APIC send - * error if we try to broadcast, thus avoid sending IPIs in this case. - */ - if (!(num_online_cpus() > 1)) - return; - - __local_send_IPI_allbutself(vector); - return; -} - -static inline void send_IPI_all(int vector) -{ - __local_send_IPI_all(vector); -} -#endif - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_mpparse.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_mpparse.h +++ /dev/null @@ -1,17 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H -#define _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H - -static inline int -mps_oem_check(struct mpc_table *mpc, char *oem, char *productid) -{ - return 0; -} - -/* Hook from generic ACPI tables.c */ -static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id) -{ - return 0; -} - - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_mpspec.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_mpspec.h +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H -#define _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H - -#define MAX_IRQ_SOURCES 256 - -#if CONFIG_BASE_SMALL == 0 -#define MAX_MP_BUSSES 256 -#else -#define MAX_MP_BUSSES 32 -#endif - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_timer.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_timer.h +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Machine specific calibrate_tsc() for generic. - * Split out from timer_tsc.c by Osamu Tomita - */ -/* ------ Calibrate the TSC ------- - * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset(). - * Too much 64-bit arithmetic here to do this cleanly in C, and for - * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2) - * output busy loop as low as possible. We avoid reading the CTC registers - * directly because of the awkward 8-bit access mechanism of the 82C54 - * device. - */ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_TIMER_H -#define _ASM_X86_MACH_DEFAULT_MACH_TIMER_H - -#define CALIBRATE_TIME_MSEC 30 /* 30 msecs */ -#define CALIBRATE_LATCH \ - ((CLOCK_TICK_RATE * CALIBRATE_TIME_MSEC + 1000/2)/1000) - -static inline void mach_prepare_counter(void) -{ - /* Set the Gate high, disable speaker */ - outb((inb(0x61) & ~0x02) | 0x01, 0x61); - - /* - * Now let's take care of CTC channel 2 - * - * Set the Gate high, program CTC channel 2 for mode 0, - * (interrupt on terminal count mode), binary count, - * load 5 * LATCH count, (LSB and MSB) to begin countdown. - * - * Some devices need a delay here. - */ - outb(0xb0, 0x43); /* binary, mode 0, LSB/MSB, Ch 2 */ - outb_p(CALIBRATE_LATCH & 0xff, 0x42); /* LSB of count */ - outb_p(CALIBRATE_LATCH >> 8, 0x42); /* MSB of count */ -} - -static inline void mach_countup(unsigned long *count_p) -{ - unsigned long count = 0; - do { - count++; - } while ((inb_p(0x61) & 0x20) == 0); - *count_p = count; -} - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_TIMER_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_traps.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_traps.h +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Machine specific NMI handling for generic. - * Split out from traps.c by Osamu Tomita - */ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H -#define _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H - -#include - -static inline unsigned char get_nmi_reason(void) -{ - return inb(0x61); -} - -static inline void reassert_nmi(void) -{ - int old_reg = -1; - - if (do_i_have_lock_cmos()) - old_reg = current_lock_cmos_reg(); - else - lock_cmos(0); /* register doesn't matter here */ - outb(0x8f, 0x70); - inb(0x71); /* dummy */ - outb(0x0f, 0x70); - inb(0x71); /* dummy */ - if (old_reg >= 0) - outb(old_reg, 0x70); - else - unlock_cmos(); -} - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/mach_wakecpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/mach_wakecpu.h +++ /dev/null @@ -1,41 +0,0 @@ -#ifndef _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H -#define _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H - -#define TRAMPOLINE_PHYS_LOW (0x467) -#define TRAMPOLINE_PHYS_HIGH (0x469) - -static inline void wait_for_init_deassert(atomic_t *deassert) -{ - while (!atomic_read(deassert)) - cpu_relax(); - return; -} - -/* Nothing to do for most platforms, since cleared by the INIT cycle */ -static inline void smp_callin_clear_local_apic(void) -{ -} - -static inline void store_NMI_vector(unsigned short *high, unsigned short *low) -{ -} - -static inline void restore_NMI_vector(unsigned short *high, unsigned short *low) -{ -} - -#ifdef CONFIG_SMP -extern void __inquire_remote_apic(int apicid); -#else /* CONFIG_SMP */ -static inline void __inquire_remote_apic(int apicid) -{ -} -#endif /* CONFIG_SMP */ - -static inline void inquire_remote_apic(int apicid) -{ - if (apic_verbosity >= APIC_DEBUG) - __inquire_remote_apic(apicid); -} - -#endif /* _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/pci-functions.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/pci-functions.h +++ /dev/null @@ -1,19 +0,0 @@ -/* - * PCI BIOS function numbering for conventional PCI BIOS - * systems - */ - -#define PCIBIOS_PCI_FUNCTION_ID 0xb1XX -#define PCIBIOS_PCI_BIOS_PRESENT 0xb101 -#define PCIBIOS_FIND_PCI_DEVICE 0xb102 -#define PCIBIOS_FIND_PCI_CLASS_CODE 0xb103 -#define PCIBIOS_GENERATE_SPECIAL_CYCLE 0xb106 -#define PCIBIOS_READ_CONFIG_BYTE 0xb108 -#define PCIBIOS_READ_CONFIG_WORD 0xb109 -#define PCIBIOS_READ_CONFIG_DWORD 0xb10a -#define PCIBIOS_WRITE_CONFIG_BYTE 0xb10b -#define PCIBIOS_WRITE_CONFIG_WORD 0xb10c -#define PCIBIOS_WRITE_CONFIG_DWORD 0xb10d -#define PCIBIOS_GET_ROUTING_OPTIONS 0xb10e -#define PCIBIOS_SET_PCI_HW_INT 0xb10f - Index: linux-2.6-tip/arch/x86/include/asm/mach-default/setup_arch.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/setup_arch.h +++ /dev/null @@ -1,3 +0,0 @@ -/* Hook to call BIOS initialisation function */ - -/* no action for generic */ Index: linux-2.6-tip/arch/x86/include/asm/mach-default/smpboot_hooks.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-default/smpboot_hooks.h +++ /dev/null @@ -1,61 +0,0 @@ -/* two abstractions specific to kernel/smpboot.c, mainly to cater to visws - * which needs to alter them. */ - -static inline void smpboot_clear_io_apic_irqs(void) -{ -#ifdef CONFIG_X86_IO_APIC - io_apic_irqs = 0; -#endif -} - -static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip) -{ - CMOS_WRITE(0xa, 0xf); - local_flush_tlb(); - pr_debug("1.\n"); - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) = - start_eip >> 4; - pr_debug("2.\n"); - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = - start_eip & 0xf; - pr_debug("3.\n"); -} - -static inline void smpboot_restore_warm_reset_vector(void) -{ - /* - * Install writable page 0 entry to set BIOS data area. - */ - local_flush_tlb(); - - /* - * Paranoid: Set warm reset code and vector here back - * to default values. - */ - CMOS_WRITE(0, 0xf); - - *((volatile long *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0; -} - -static inline void __init smpboot_setup_io_apic(void) -{ -#ifdef CONFIG_X86_IO_APIC - /* - * Here we can be sure that there is an IO-APIC in the system. Let's - * go and set it up: - */ - if (!skip_ioapic_setup && nr_ioapics) - setup_IO_APIC(); - else { - nr_ioapics = 0; - localise_nmi_watchdog(); - } -#endif -} - -static inline void smpboot_clear_io_apic(void) -{ -#ifdef CONFIG_X86_IO_APIC - nr_ioapics = 0; -#endif -} Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/gpio.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/gpio.h +++ /dev/null @@ -1,15 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_GPIO_H -#define _ASM_X86_MACH_GENERIC_GPIO_H - -int gpio_request(unsigned gpio, const char *label); -void gpio_free(unsigned gpio); -int gpio_direction_input(unsigned gpio); -int gpio_direction_output(unsigned gpio, int value); -int gpio_get_value(unsigned gpio); -void gpio_set_value(unsigned gpio, int value); -int gpio_to_irq(unsigned gpio); -int irq_to_gpio(unsigned irq); - -#include /* cansleep wrappers */ - -#endif /* _ASM_X86_MACH_GENERIC_GPIO_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_apic.h +++ /dev/null @@ -1,35 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_APIC_H -#define _ASM_X86_MACH_GENERIC_MACH_APIC_H - -#include - -#define esr_disable (genapic->ESR_DISABLE) -#define NO_BALANCE_IRQ (genapic->no_balance_irq) -#define INT_DELIVERY_MODE (genapic->int_delivery_mode) -#define INT_DEST_MODE (genapic->int_dest_mode) -#undef APIC_DEST_LOGICAL -#define APIC_DEST_LOGICAL (genapic->apic_destination_logical) -#define TARGET_CPUS (genapic->target_cpus()) -#define apic_id_registered (genapic->apic_id_registered) -#define init_apic_ldr (genapic->init_apic_ldr) -#define ioapic_phys_id_map (genapic->ioapic_phys_id_map) -#define setup_apic_routing (genapic->setup_apic_routing) -#define multi_timer_check (genapic->multi_timer_check) -#define apicid_to_node (genapic->apicid_to_node) -#define cpu_to_logical_apicid (genapic->cpu_to_logical_apicid) -#define cpu_present_to_apicid (genapic->cpu_present_to_apicid) -#define apicid_to_cpu_present (genapic->apicid_to_cpu_present) -#define setup_portio_remap (genapic->setup_portio_remap) -#define check_apicid_present (genapic->check_apicid_present) -#define check_phys_apicid_present (genapic->check_phys_apicid_present) -#define check_apicid_used (genapic->check_apicid_used) -#define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid) -#define cpu_mask_to_apicid_and (genapic->cpu_mask_to_apicid_and) -#define vector_allocation_domain (genapic->vector_allocation_domain) -#define enable_apic_mode (genapic->enable_apic_mode) -#define phys_pkg_id (genapic->phys_pkg_id) -#define wakeup_secondary_cpu (genapic->wakeup_cpu) - -extern void generic_bigsmp_probe(void); - -#endif /* _ASM_X86_MACH_GENERIC_MACH_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_apicdef.h +++ /dev/null @@ -1,11 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_APICDEF_H -#define _ASM_X86_MACH_GENERIC_MACH_APICDEF_H - -#ifndef APIC_DEFINITION -#include - -#define GET_APIC_ID (genapic->get_apic_id) -#define APIC_ID_MASK (genapic->apic_id_mask) -#endif - -#endif /* _ASM_X86_MACH_GENERIC_MACH_APICDEF_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_ipi.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_IPI_H -#define _ASM_X86_MACH_GENERIC_MACH_IPI_H - -#include - -#define send_IPI_mask (genapic->send_IPI_mask) -#define send_IPI_allbutself (genapic->send_IPI_allbutself) -#define send_IPI_all (genapic->send_IPI_all) - -#endif /* _ASM_X86_MACH_GENERIC_MACH_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_mpparse.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_mpparse.h +++ /dev/null @@ -1,9 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H -#define _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H - - -extern int mps_oem_check(struct mpc_table *, char *, char *); - -extern int acpi_madt_oem_check(char *, char *); - -#endif /* _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_mpspec.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_mpspec.h +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H -#define _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H - -#define MAX_IRQ_SOURCES 256 - -/* Summit or generic (i.e. installer) kernels need lots of bus entries. */ -/* Maximum 256 PCI busses, plus 1 ISA bus in each of 4 cabinets. */ -#define MAX_MP_BUSSES 260 - -extern void numaq_mps_oem_check(struct mpc_table *, char *, char *); - -#endif /* _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-generic/mach_wakecpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-generic/mach_wakecpu.h +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef _ASM_X86_MACH_GENERIC_MACH_WAKECPU_H -#define _ASM_X86_MACH_GENERIC_MACH_WAKECPU_H - -#define TRAMPOLINE_PHYS_LOW (genapic->trampoline_phys_low) -#define TRAMPOLINE_PHYS_HIGH (genapic->trampoline_phys_high) -#define wait_for_init_deassert (genapic->wait_for_init_deassert) -#define smp_callin_clear_local_apic (genapic->smp_callin_clear_local_apic) -#define store_NMI_vector (genapic->store_NMI_vector) -#define restore_NMI_vector (genapic->restore_NMI_vector) -#define inquire_remote_apic (genapic->inquire_remote_apic) - -#endif /* _ASM_X86_MACH_GENERIC_MACH_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-rdc321x/gpio.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-rdc321x/gpio.h +++ /dev/null @@ -1,60 +0,0 @@ -#ifndef _ASM_X86_MACH_RDC321X_GPIO_H -#define _ASM_X86_MACH_RDC321X_GPIO_H - -#include - -extern int rdc_gpio_get_value(unsigned gpio); -extern void rdc_gpio_set_value(unsigned gpio, int value); -extern int rdc_gpio_direction_input(unsigned gpio); -extern int rdc_gpio_direction_output(unsigned gpio, int value); -extern int rdc_gpio_request(unsigned gpio, const char *label); -extern void rdc_gpio_free(unsigned gpio); -extern void __init rdc321x_gpio_setup(void); - -/* Wrappers for the arch-neutral GPIO API */ - -static inline int gpio_request(unsigned gpio, const char *label) -{ - return rdc_gpio_request(gpio, label); -} - -static inline void gpio_free(unsigned gpio) -{ - might_sleep(); - rdc_gpio_free(gpio); -} - -static inline int gpio_direction_input(unsigned gpio) -{ - return rdc_gpio_direction_input(gpio); -} - -static inline int gpio_direction_output(unsigned gpio, int value) -{ - return rdc_gpio_direction_output(gpio, value); -} - -static inline int gpio_get_value(unsigned gpio) -{ - return rdc_gpio_get_value(gpio); -} - -static inline void gpio_set_value(unsigned gpio, int value) -{ - rdc_gpio_set_value(gpio, value); -} - -static inline int gpio_to_irq(unsigned gpio) -{ - return gpio; -} - -static inline int irq_to_gpio(unsigned irq) -{ - return irq; -} - -/* For cansleep */ -#include - -#endif /* _ASM_X86_MACH_RDC321X_GPIO_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach-rdc321x/rdc321x_defs.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-rdc321x/rdc321x_defs.h +++ /dev/null @@ -1,12 +0,0 @@ -#define PFX "rdc321x: " - -/* General purpose configuration and data registers */ -#define RDC3210_CFGREG_ADDR 0x0CF8 -#define RDC3210_CFGREG_DATA 0x0CFC - -#define RDC321X_GPIO_CTRL_REG1 0x48 -#define RDC321X_GPIO_CTRL_REG2 0x84 -#define RDC321X_GPIO_DATA_REG1 0x4c -#define RDC321X_GPIO_DATA_REG2 0x88 - -#define RDC321X_MAX_GPIO 58 Index: linux-2.6-tip/arch/x86/include/asm/mach-voyager/do_timer.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-voyager/do_timer.h +++ /dev/null @@ -1,17 +0,0 @@ -/* defines for inline arch setup functions */ -#include - -#include -#include - -/** - * do_timer_interrupt_hook - hook into timer tick - * - * Call the pit clock event handler. see asm/i8253.h - **/ -static inline void do_timer_interrupt_hook(void) -{ - global_clock_event->event_handler(global_clock_event); - voyager_timer_interrupt(); -} - Index: linux-2.6-tip/arch/x86/include/asm/mach-voyager/entry_arch.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-voyager/entry_arch.h +++ /dev/null @@ -1,26 +0,0 @@ -/* -*- mode: c; c-basic-offset: 8 -*- */ - -/* Copyright (C) 2002 - * - * Author: James.Bottomley@HansenPartnership.com - * - * linux/arch/i386/voyager/entry_arch.h - * - * This file builds the VIC and QIC CPI gates - */ - -/* initialise the voyager interrupt gates - * - * This uses the macros in irq.h to set up assembly jump gates. The - * calls are then redirected to the same routine with smp_ prefixed */ -BUILD_INTERRUPT(vic_sys_interrupt, VIC_SYS_INT) -BUILD_INTERRUPT(vic_cmn_interrupt, VIC_CMN_INT) -BUILD_INTERRUPT(vic_cpi_interrupt, VIC_CPI_LEVEL0); - -/* do all the QIC interrupts */ -BUILD_INTERRUPT(qic_timer_interrupt, QIC_TIMER_CPI); -BUILD_INTERRUPT(qic_invalidate_interrupt, QIC_INVALIDATE_CPI); -BUILD_INTERRUPT(qic_reschedule_interrupt, QIC_RESCHEDULE_CPI); -BUILD_INTERRUPT(qic_enable_irq_interrupt, QIC_ENABLE_IRQ_CPI); -BUILD_INTERRUPT(qic_call_function_interrupt, QIC_CALL_FUNCTION_CPI); -BUILD_INTERRUPT(qic_call_function_single_interrupt, QIC_CALL_FUNCTION_SINGLE_CPI); Index: linux-2.6-tip/arch/x86/include/asm/mach-voyager/setup_arch.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mach-voyager/setup_arch.h +++ /dev/null @@ -1,12 +0,0 @@ -#include -#include -#define VOYAGER_BIOS_INFO ((struct voyager_bios_info *) \ - (&boot_params.apm_bios_info)) - -/* Hook to call BIOS initialisation function */ - -/* for voyager, pass the voyager BIOS/SUS info area to the detection - * routines */ - -#define ARCH_SETUP voyager_detect(VOYAGER_BIOS_INFO); - Index: linux-2.6-tip/arch/x86/include/asm/mach_timer.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/mach_timer.h @@ -0,0 +1,48 @@ +/* + * Machine specific calibrate_tsc() for generic. + * Split out from timer_tsc.c by Osamu Tomita + */ +/* ------ Calibrate the TSC ------- + * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset(). + * Too much 64-bit arithmetic here to do this cleanly in C, and for + * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2) + * output busy loop as low as possible. We avoid reading the CTC registers + * directly because of the awkward 8-bit access mechanism of the 82C54 + * device. + */ +#ifndef _ASM_X86_MACH_DEFAULT_MACH_TIMER_H +#define _ASM_X86_MACH_DEFAULT_MACH_TIMER_H + +#define CALIBRATE_TIME_MSEC 30 /* 30 msecs */ +#define CALIBRATE_LATCH \ + ((CLOCK_TICK_RATE * CALIBRATE_TIME_MSEC + 1000/2)/1000) + +static inline void mach_prepare_counter(void) +{ + /* Set the Gate high, disable speaker */ + outb((inb(0x61) & ~0x02) | 0x01, 0x61); + + /* + * Now let's take care of CTC channel 2 + * + * Set the Gate high, program CTC channel 2 for mode 0, + * (interrupt on terminal count mode), binary count, + * load 5 * LATCH count, (LSB and MSB) to begin countdown. + * + * Some devices need a delay here. + */ + outb(0xb0, 0x43); /* binary, mode 0, LSB/MSB, Ch 2 */ + outb_p(CALIBRATE_LATCH & 0xff, 0x42); /* LSB of count */ + outb_p(CALIBRATE_LATCH >> 8, 0x42); /* MSB of count */ +} + +static inline void mach_countup(unsigned long *count_p) +{ + unsigned long count = 0; + do { + count++; + } while ((inb_p(0x61) & 0x20) == 0); + *count_p = count; +} + +#endif /* _ASM_X86_MACH_DEFAULT_MACH_TIMER_H */ Index: linux-2.6-tip/arch/x86/include/asm/mach_traps.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/mach_traps.h @@ -0,0 +1,33 @@ +/* + * Machine specific NMI handling for generic. + * Split out from traps.c by Osamu Tomita + */ +#ifndef _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H +#define _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H + +#include + +static inline unsigned char get_nmi_reason(void) +{ + return inb(0x61); +} + +static inline void reassert_nmi(void) +{ + int old_reg = -1; + + if (do_i_have_lock_cmos()) + old_reg = current_lock_cmos_reg(); + else + lock_cmos(0); /* register doesn't matter here */ + outb(0x8f, 0x70); + inb(0x71); /* dummy */ + outb(0x0f, 0x70); + inb(0x71); /* dummy */ + if (old_reg >= 0) + outb(old_reg, 0x70); + else + unlock_cmos(); +} + +#endif /* _ASM_X86_MACH_DEFAULT_MACH_TRAPS_H */ Index: linux-2.6-tip/arch/x86/include/asm/mce.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mce.h +++ linux-2.6-tip/arch/x86/include/asm/mce.h @@ -11,6 +11,8 @@ */ #define MCG_CTL_P (1UL<<8) /* MCG_CAP register available */ +#define MCG_EXT_P (1ULL<<9) /* Extended registers available */ +#define MCG_CMCI_P (1ULL<<10) /* CMCI supported */ #define MCG_STATUS_RIPV (1UL<<0) /* restart ip valid */ #define MCG_STATUS_EIPV (1UL<<1) /* ip points to correct instruction */ @@ -90,14 +92,29 @@ extern int mce_disabled; #include +void mce_setup(struct mce *m); void mce_log(struct mce *m); DECLARE_PER_CPU(struct sys_device, device_mce); extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu); +/* + * To support more than 128 would need to escape the predefined + * Linux defined extended banks first. + */ +#define MAX_NR_BANKS (MCE_EXTENDED_BANK - 1) + #ifdef CONFIG_X86_MCE_INTEL void mce_intel_feature_init(struct cpuinfo_x86 *c); +void cmci_clear(void); +void cmci_reenable(void); +void cmci_rediscover(int dying); +void cmci_recheck(void); #else static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } +static inline void cmci_clear(void) {} +static inline void cmci_reenable(void) {} +static inline void cmci_rediscover(int dying) {} +static inline void cmci_recheck(void) {} #endif #ifdef CONFIG_X86_MCE_AMD @@ -106,11 +123,23 @@ void mce_amd_feature_init(struct cpuinfo static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } #endif -void mce_log_therm_throt_event(unsigned int cpu, __u64 status); +extern int mce_available(struct cpuinfo_x86 *c); + +void mce_log_therm_throt_event(__u64 status); extern atomic_t mce_entry; extern void do_machine_check(struct pt_regs *, long); + +typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS); +DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); + +enum mcp_flags { + MCP_TIMESTAMP = (1 << 0), /* log time stamp */ + MCP_UC = (1 << 1), /* log uncorrected errors */ +}; +extern void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); + extern int mce_notify_user(void); #endif /* !CONFIG_X86_32 */ @@ -120,8 +149,8 @@ extern void mcheck_init(struct cpuinfo_x #else #define mcheck_init(c) do { } while (0) #endif -extern void stop_mce(void); -extern void restart_mce(void); + +extern void (*mce_threshold_vector)(void); #endif /* __KERNEL__ */ #endif /* _ASM_X86_MCE_H */ Index: linux-2.6-tip/arch/x86/include/asm/mmu_context.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mmu_context.h +++ linux-2.6-tip/arch/x86/include/asm/mmu_context.h @@ -21,11 +21,54 @@ static inline void paravirt_activate_mm( int init_new_context(struct task_struct *tsk, struct mm_struct *mm); void destroy_context(struct mm_struct *mm); -#ifdef CONFIG_X86_32 -# include "mmu_context_32.h" -#else -# include "mmu_context_64.h" + +static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) +{ +#ifdef CONFIG_SMP + if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) + percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY); +#endif +} + +static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, + struct task_struct *tsk) +{ + unsigned cpu = smp_processor_id(); + + if (likely(prev != next)) { + /* stop flush ipis for the previous mm */ + cpu_clear(cpu, prev->cpu_vm_mask); +#ifdef CONFIG_SMP + percpu_write(cpu_tlbstate.state, TLBSTATE_OK); + percpu_write(cpu_tlbstate.active_mm, next); #endif + cpu_set(cpu, next->cpu_vm_mask); + + /* Re-load page tables */ + load_cr3(next->pgd); + + /* + * load the LDT, if the LDT is different: + */ + if (unlikely(prev->context.ldt != next->context.ldt)) + load_LDT_nolock(&next->context); + } +#ifdef CONFIG_SMP + else { + percpu_write(cpu_tlbstate.state, TLBSTATE_OK); + BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next); + + if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) { + /* We were in lazy tlb mode and leave_mm disabled + * tlb flush IPI delivery. We must reload CR3 + * to make sure to use no freed page tables. + */ + load_cr3(next->pgd); + load_LDT_nolock(&next->context); + } + } +#endif +} #define activate_mm(prev, next) \ do { \ @@ -33,5 +76,17 @@ do { \ switch_mm((prev), (next), NULL); \ } while (0); +#ifdef CONFIG_X86_32 +#define deactivate_mm(tsk, mm) \ +do { \ + lazy_load_gs(0); \ +} while (0) +#else +#define deactivate_mm(tsk, mm) \ +do { \ + load_gs_index(0); \ + loadsegment(fs, 0); \ +} while (0) +#endif #endif /* _ASM_X86_MMU_CONTEXT_H */ Index: linux-2.6-tip/arch/x86/include/asm/mmu_context_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mmu_context_32.h +++ /dev/null @@ -1,55 +0,0 @@ -#ifndef _ASM_X86_MMU_CONTEXT_32_H -#define _ASM_X86_MMU_CONTEXT_32_H - -static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) -{ -#ifdef CONFIG_SMP - if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK) - x86_write_percpu(cpu_tlbstate.state, TLBSTATE_LAZY); -#endif -} - -static inline void switch_mm(struct mm_struct *prev, - struct mm_struct *next, - struct task_struct *tsk) -{ - int cpu = smp_processor_id(); - - if (likely(prev != next)) { - /* stop flush ipis for the previous mm */ - cpu_clear(cpu, prev->cpu_vm_mask); -#ifdef CONFIG_SMP - x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK); - x86_write_percpu(cpu_tlbstate.active_mm, next); -#endif - cpu_set(cpu, next->cpu_vm_mask); - - /* Re-load page tables */ - load_cr3(next->pgd); - - /* - * load the LDT, if the LDT is different: - */ - if (unlikely(prev->context.ldt != next->context.ldt)) - load_LDT_nolock(&next->context); - } -#ifdef CONFIG_SMP - else { - x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK); - BUG_ON(x86_read_percpu(cpu_tlbstate.active_mm) != next); - - if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) { - /* We were in lazy tlb mode and leave_mm disabled - * tlb flush IPI delivery. We must reload %cr3. - */ - load_cr3(next->pgd); - load_LDT_nolock(&next->context); - } - } -#endif -} - -#define deactivate_mm(tsk, mm) \ - asm("movl %0,%%gs": :"r" (0)); - -#endif /* _ASM_X86_MMU_CONTEXT_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/mmu_context_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mmu_context_64.h +++ /dev/null @@ -1,54 +0,0 @@ -#ifndef _ASM_X86_MMU_CONTEXT_64_H -#define _ASM_X86_MMU_CONTEXT_64_H - -#include - -static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) -{ -#ifdef CONFIG_SMP - if (read_pda(mmu_state) == TLBSTATE_OK) - write_pda(mmu_state, TLBSTATE_LAZY); -#endif -} - -static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk) -{ - unsigned cpu = smp_processor_id(); - if (likely(prev != next)) { - /* stop flush ipis for the previous mm */ - cpu_clear(cpu, prev->cpu_vm_mask); -#ifdef CONFIG_SMP - write_pda(mmu_state, TLBSTATE_OK); - write_pda(active_mm, next); -#endif - cpu_set(cpu, next->cpu_vm_mask); - load_cr3(next->pgd); - - if (unlikely(next->context.ldt != prev->context.ldt)) - load_LDT_nolock(&next->context); - } -#ifdef CONFIG_SMP - else { - write_pda(mmu_state, TLBSTATE_OK); - if (read_pda(active_mm) != next) - BUG(); - if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) { - /* We were in lazy tlb mode and leave_mm disabled - * tlb flush IPI delivery. We must reload CR3 - * to make sure to use no freed page tables. - */ - load_cr3(next->pgd); - load_LDT_nolock(&next->context); - } - } -#endif -} - -#define deactivate_mm(tsk, mm) \ -do { \ - load_gs_index(0); \ - asm volatile("movl %0,%%fs"::"r"(0)); \ -} while (0) - -#endif /* _ASM_X86_MMU_CONTEXT_64_H */ Index: linux-2.6-tip/arch/x86/include/asm/mmzone_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mmzone_32.h +++ linux-2.6-tip/arch/x86/include/asm/mmzone_32.h @@ -91,46 +91,9 @@ static inline int pfn_valid(int pfn) #endif /* CONFIG_DISCONTIGMEM */ #ifdef CONFIG_NEED_MULTIPLE_NODES - -/* - * Following are macros that are specific to this numa platform. - */ -#define reserve_bootmem(addr, size, flags) \ - reserve_bootmem_node(NODE_DATA(0), (addr), (size), (flags)) -#define alloc_bootmem(x) \ - __alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) -#define alloc_bootmem_nopanic(x) \ - __alloc_bootmem_node_nopanic(NODE_DATA(0), (x), SMP_CACHE_BYTES, \ - __pa(MAX_DMA_ADDRESS)) -#define alloc_bootmem_low(x) \ - __alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, 0) -#define alloc_bootmem_pages(x) \ - __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) -#define alloc_bootmem_pages_nopanic(x) \ - __alloc_bootmem_node_nopanic(NODE_DATA(0), (x), PAGE_SIZE, \ - __pa(MAX_DMA_ADDRESS)) -#define alloc_bootmem_low_pages(x) \ - __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0) -#define alloc_bootmem_node(pgdat, x) \ -({ \ - struct pglist_data __maybe_unused \ - *__alloc_bootmem_node__pgdat = (pgdat); \ - __alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, \ - __pa(MAX_DMA_ADDRESS)); \ -}) -#define alloc_bootmem_pages_node(pgdat, x) \ -({ \ - struct pglist_data __maybe_unused \ - *__alloc_bootmem_node__pgdat = (pgdat); \ - __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, \ - __pa(MAX_DMA_ADDRESS)); \ -}) -#define alloc_bootmem_low_pages_node(pgdat, x) \ -({ \ - struct pglist_data __maybe_unused \ - *__alloc_bootmem_node__pgdat = (pgdat); \ - __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0); \ -}) +/* always use node 0 for bootmem on this numa platform */ +#define bootmem_arch_preferred_node(__bdata, size, align, goal, limit) \ + (NODE_DATA(0)->bdata) #endif /* CONFIG_NEED_MULTIPLE_NODES */ #endif /* _ASM_X86_MMZONE_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/mpspec.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mpspec.h +++ linux-2.6-tip/arch/x86/include/asm/mpspec.h @@ -9,7 +9,18 @@ extern int apic_version[MAX_APICS]; extern int pic_mode; #ifdef CONFIG_X86_32 -#include + +/* + * Summit or generic (i.e. installer) kernels need lots of bus entries. + * Maximum 256 PCI busses, plus 1 ISA bus in each of 4 cabinets. + */ +#if CONFIG_BASE_SMALL == 0 +# define MAX_MP_BUSSES 260 +#else +# define MAX_MP_BUSSES 32 +#endif + +#define MAX_IRQ_SOURCES 256 extern unsigned int def_to_bigsmp; extern u8 apicid_2_node[]; @@ -20,15 +31,15 @@ extern int mp_bus_id_to_local[MAX_MP_BUS extern int quad_local_to_mp_bus_id [NR_CPUS/4][4]; #endif -#define MAX_APICID 256 +#define MAX_APICID 256 -#else +#else /* CONFIG_X86_64: */ -#define MAX_MP_BUSSES 256 +#define MAX_MP_BUSSES 256 /* Each PCI slot may be a combo card with its own bus. 4 IRQ pins per slot. */ -#define MAX_IRQ_SOURCES (MAX_MP_BUSSES * 4) +#define MAX_IRQ_SOURCES (MAX_MP_BUSSES * 4) -#endif +#endif /* CONFIG_X86_64 */ extern void early_find_smp_config(void); extern void early_get_smp_config(void); @@ -45,11 +56,13 @@ extern int smp_found_config; extern int mpc_default_type; extern unsigned long mp_lapic_addr; -extern void find_smp_config(void); extern void get_smp_config(void); + #ifdef CONFIG_X86_MPPARSE +extern void find_smp_config(void); extern void early_reserve_e820_mpc_new(void); #else +static inline void find_smp_config(void) { } static inline void early_reserve_e820_mpc_new(void) { } #endif @@ -64,6 +77,8 @@ extern int acpi_probe_gsi(void); #ifdef CONFIG_X86_IO_APIC extern int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, u32 gsi, int triggering, int polarity); +extern int mp_find_ioapic(int gsi); +extern int mp_find_ioapic_pin(int ioapic, int gsi); #else static inline int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, @@ -148,4 +163,8 @@ static inline void physid_set_mask_of_ph extern physid_mask_t phys_cpu_present_map; +extern int generic_mps_oem_check(struct mpc_table *, char *, char *); + +extern int default_acpi_madt_oem_check(char *, char *); + #endif /* _ASM_X86_MPSPEC_H */ Index: linux-2.6-tip/arch/x86/include/asm/mpspec_def.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/mpspec_def.h +++ linux-2.6-tip/arch/x86/include/asm/mpspec_def.h @@ -24,17 +24,18 @@ # endif #endif -struct intel_mp_floating { - char mpf_signature[4]; /* "_MP_" */ - unsigned int mpf_physptr; /* Configuration table address */ - unsigned char mpf_length; /* Our length (paragraphs) */ - unsigned char mpf_specification;/* Specification version */ - unsigned char mpf_checksum; /* Checksum (makes sum 0) */ - unsigned char mpf_feature1; /* Standard or configuration ? */ - unsigned char mpf_feature2; /* Bit7 set for IMCR|PIC */ - unsigned char mpf_feature3; /* Unused (0) */ - unsigned char mpf_feature4; /* Unused (0) */ - unsigned char mpf_feature5; /* Unused (0) */ +/* Intel MP Floating Pointer Structure */ +struct mpf_intel { + char signature[4]; /* "_MP_" */ + unsigned int physptr; /* Configuration table address */ + unsigned char length; /* Our length (paragraphs) */ + unsigned char specification; /* Specification version */ + unsigned char checksum; /* Checksum (makes sum 0) */ + unsigned char feature1; /* Standard or configuration ? */ + unsigned char feature2; /* Bit7 set for IMCR|PIC */ + unsigned char feature3; /* Unused (0) */ + unsigned char feature4; /* Unused (0) */ + unsigned char feature5; /* Unused (0) */ }; #define MPC_SIGNATURE "PCMP" Index: linux-2.6-tip/arch/x86/include/asm/msidef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/msidef.h +++ linux-2.6-tip/arch/x86/include/asm/msidef.h @@ -47,6 +47,7 @@ #define MSI_ADDR_DEST_ID_MASK 0x00ffff0 #define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & \ MSI_ADDR_DEST_ID_MASK) +#define MSI_ADDR_EXT_DEST_ID(dest) ((dest) & 0xffffff00) #define MSI_ADDR_IR_EXT_INT (1 << 4) #define MSI_ADDR_IR_SHV (1 << 3) Index: linux-2.6-tip/arch/x86/include/asm/msr-index.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/msr-index.h +++ linux-2.6-tip/arch/x86/include/asm/msr-index.h @@ -77,6 +77,11 @@ #define MSR_IA32_MC0_ADDR 0x00000402 #define MSR_IA32_MC0_MISC 0x00000403 +/* These are consecutive and not in the normal 4er MCE bank block */ +#define MSR_IA32_MC0_CTL2 0x00000280 +#define CMCI_EN (1ULL << 30) +#define CMCI_THRESHOLD_MASK 0xffffULL + #define MSR_P6_PERFCTR0 0x000000c1 #define MSR_P6_PERFCTR1 0x000000c2 #define MSR_P6_EVNTSEL0 0x00000186 Index: linux-2.6-tip/arch/x86/include/asm/numa_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numa_32.h +++ linux-2.6-tip/arch/x86/include/asm/numa_32.h @@ -4,8 +4,12 @@ extern int pxm_to_nid(int pxm); extern void numa_remove_cpu(int cpu); -#ifdef CONFIG_NUMA +#ifdef CONFIG_HIGHMEM extern void set_highmem_pages_init(void); +#else +static inline void set_highmem_pages_init(void) +{ +} #endif #endif /* _ASM_X86_NUMA_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/numaq.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq.h +++ linux-2.6-tip/arch/x86/include/asm/numaq.h @@ -31,6 +31,8 @@ extern int found_numaq; extern int get_memcfg_numaq(void); +extern void *xquad_portio; + /* * SYS_CFG_DATA_PRIV_ADDR, struct eachquadmem, and struct sys_cfg_data are the */ Index: linux-2.6-tip/arch/x86/include/asm/numaq/apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq/apic.h +++ /dev/null @@ -1,142 +0,0 @@ -#ifndef __ASM_NUMAQ_APIC_H -#define __ASM_NUMAQ_APIC_H - -#include -#include -#include - -#define APIC_DFR_VALUE (APIC_DFR_CLUSTER) - -static inline const cpumask_t *target_cpus(void) -{ - return &CPU_MASK_ALL; -} - -#define NO_BALANCE_IRQ (1) -#define esr_disable (1) - -#define INT_DELIVERY_MODE dest_LowestPrio -#define INT_DEST_MODE 0 /* physical delivery on LOCAL quad */ - -static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid) -{ - return physid_isset(apicid, bitmap); -} -static inline unsigned long check_apicid_present(int bit) -{ - return physid_isset(bit, phys_cpu_present_map); -} -#define apicid_cluster(apicid) (apicid & 0xF0) - -static inline int apic_id_registered(void) -{ - return 1; -} - -static inline void init_apic_ldr(void) -{ - /* Already done in NUMA-Q firmware */ -} - -static inline void setup_apic_routing(void) -{ - printk("Enabling APIC mode: %s. Using %d I/O APICs\n", - "NUMA-Q", nr_ioapics); -} - -/* - * Skip adding the timer int on secondary nodes, which causes - * a small but painful rift in the time-space continuum. - */ -static inline int multi_timer_check(int apic, int irq) -{ - return apic != 0 && irq == 0; -} - -static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map) -{ - /* We don't have a good way to do this yet - hack */ - return physids_promote(0xFUL); -} - -/* Mapping from cpu number to logical apicid */ -extern u8 cpu_2_logical_apicid[]; -static inline int cpu_to_logical_apicid(int cpu) -{ - if (cpu >= nr_cpu_ids) - return BAD_APICID; - return (int)cpu_2_logical_apicid[cpu]; -} - -/* - * Supporting over 60 cpus on NUMA-Q requires a locality-dependent - * cpu to APIC ID relation to properly interact with the intelligent - * mode of the cluster controller. - */ -static inline int cpu_present_to_apicid(int mps_cpu) -{ - if (mps_cpu < 60) - return ((mps_cpu >> 2) << 4) | (1 << (mps_cpu & 0x3)); - else - return BAD_APICID; -} - -static inline int apicid_to_node(int logical_apicid) -{ - return logical_apicid >> 4; -} - -static inline physid_mask_t apicid_to_cpu_present(int logical_apicid) -{ - int node = apicid_to_node(logical_apicid); - int cpu = __ffs(logical_apicid & 0xf); - - return physid_mask_of_physid(cpu + 4*node); -} - -extern void *xquad_portio; - -static inline void setup_portio_remap(void) -{ - int num_quads = num_online_nodes(); - - if (num_quads <= 1) - return; - - printk("Remapping cross-quad port I/O for %d quads\n", num_quads); - xquad_portio = ioremap(XQUAD_PORTIO_BASE, num_quads*XQUAD_PORTIO_QUAD); - printk("xquad_portio vaddr 0x%08lx, len %08lx\n", - (u_long) xquad_portio, (u_long) num_quads*XQUAD_PORTIO_QUAD); -} - -static inline int check_phys_apicid_present(int boot_cpu_physical_apicid) -{ - return (1); -} - -static inline void enable_apic_mode(void) -{ -} - -/* - * We use physical apicids here, not logical, so just return the default - * physical broadcast to stop people from breaking us - */ -static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask) -{ - return (int) 0xF; -} - -static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask, - const struct cpumask *andmask) -{ - return (int) 0xF; -} - -/* No NUMA-Q box has a HT CPU, but it can't hurt to use the default code. */ -static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb) -{ - return cpuid_apic >> index_msb; -} - -#endif /* __ASM_NUMAQ_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/numaq/apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq/apicdef.h +++ /dev/null @@ -1,14 +0,0 @@ -#ifndef __ASM_NUMAQ_APICDEF_H -#define __ASM_NUMAQ_APICDEF_H - - -#define APIC_ID_MASK (0xF<<24) - -static inline unsigned get_apic_id(unsigned long x) -{ - return (((x)>>24)&0x0F); -} - -#define GET_APIC_ID(x) get_apic_id(x) - -#endif Index: linux-2.6-tip/arch/x86/include/asm/numaq/ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq/ipi.h +++ /dev/null @@ -1,22 +0,0 @@ -#ifndef __ASM_NUMAQ_IPI_H -#define __ASM_NUMAQ_IPI_H - -void send_IPI_mask_sequence(const struct cpumask *mask, int vector); -void send_IPI_mask_allbutself(const struct cpumask *mask, int vector); - -static inline void send_IPI_mask(const struct cpumask *mask, int vector) -{ - send_IPI_mask_sequence(mask, vector); -} - -static inline void send_IPI_allbutself(int vector) -{ - send_IPI_mask_allbutself(cpu_online_mask, vector); -} - -static inline void send_IPI_all(int vector) -{ - send_IPI_mask(cpu_online_mask, vector); -} - -#endif /* __ASM_NUMAQ_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/numaq/mpparse.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq/mpparse.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_NUMAQ_MPPARSE_H -#define __ASM_NUMAQ_MPPARSE_H - -extern void numaq_mps_oem_check(struct mpc_table *, char *, char *); - -#endif /* __ASM_NUMAQ_MPPARSE_H */ Index: linux-2.6-tip/arch/x86/include/asm/numaq/wakecpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/numaq/wakecpu.h +++ /dev/null @@ -1,45 +0,0 @@ -#ifndef __ASM_NUMAQ_WAKECPU_H -#define __ASM_NUMAQ_WAKECPU_H - -/* This file copes with machines that wakeup secondary CPUs by NMIs */ - -#define TRAMPOLINE_PHYS_LOW (0x8) -#define TRAMPOLINE_PHYS_HIGH (0xa) - -/* We don't do anything here because we use NMI's to boot instead */ -static inline void wait_for_init_deassert(atomic_t *deassert) -{ -} - -/* - * Because we use NMIs rather than the INIT-STARTUP sequence to - * bootstrap the CPUs, the APIC may be in a weird state. Kick it. - */ -static inline void smp_callin_clear_local_apic(void) -{ - clear_local_APIC(); -} - -static inline void store_NMI_vector(unsigned short *high, unsigned short *low) -{ - printk("Storing NMI vector\n"); - *high = - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)); - *low = - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)); -} - -static inline void restore_NMI_vector(unsigned short *high, unsigned short *low) -{ - printk("Restoring NMI vector\n"); - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) = - *high; - *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = - *low; -} - -static inline void inquire_remote_apic(int apicid) -{ -} - -#endif /* __ASM_NUMAQ_WAKECPU_H */ Index: linux-2.6-tip/arch/x86/include/asm/page.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/page.h +++ linux-2.6-tip/arch/x86/include/asm/page.h @@ -1,42 +1,11 @@ #ifndef _ASM_X86_PAGE_H #define _ASM_X86_PAGE_H -#include - -/* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 -#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) +#include #ifdef __KERNEL__ -#define __PHYSICAL_MASK ((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1) -#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1) - -/* Cast PAGE_MASK to a signed type so that it is sign-extended if - virtual addresses are 32-bits but physical addresses are larger - (ie, 32-bit PAE). */ -#define PHYSICAL_PAGE_MASK (((signed long)PAGE_MASK) & __PHYSICAL_MASK) - -/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */ -#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK) - -/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */ -#define PTE_FLAGS_MASK (~PTE_PFN_MASK) - -#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT) -#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1)) - -#define HPAGE_SHIFT PMD_SHIFT -#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT) -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) - -#define HUGE_MAX_HSTATE 2 - -#ifndef __ASSEMBLY__ -#include -#endif +#include #ifdef CONFIG_X86_64 #include @@ -44,38 +13,18 @@ #include #endif /* CONFIG_X86_64 */ -#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET) - -#define VM_DATA_DEFAULT_FLAGS \ - (((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \ - VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - - #ifndef __ASSEMBLY__ -typedef struct { pgdval_t pgd; } pgd_t; -typedef struct { pgprotval_t pgprot; } pgprot_t; - -extern int page_is_ram(unsigned long pagenr); -extern int devmem_is_allowed(unsigned long pagenr); -extern void map_devmem(unsigned long pfn, unsigned long size, - pgprot_t vma_prot); -extern void unmap_devmem(unsigned long pfn, unsigned long size, - pgprot_t vma_prot); - -extern unsigned long max_low_pfn_mapped; -extern unsigned long max_pfn_mapped; - struct page; static inline void clear_user_page(void *page, unsigned long vaddr, - struct page *pg) + struct page *pg) { clear_page(page); } static inline void copy_user_page(void *to, void *from, unsigned long vaddr, - struct page *topage) + struct page *topage) { copy_page(to, from); } @@ -84,99 +33,6 @@ static inline void copy_user_page(void * alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE -static inline pgd_t native_make_pgd(pgdval_t val) -{ - return (pgd_t) { val }; -} - -static inline pgdval_t native_pgd_val(pgd_t pgd) -{ - return pgd.pgd; -} - -#if PAGETABLE_LEVELS >= 3 -#if PAGETABLE_LEVELS == 4 -typedef struct { pudval_t pud; } pud_t; - -static inline pud_t native_make_pud(pmdval_t val) -{ - return (pud_t) { val }; -} - -static inline pudval_t native_pud_val(pud_t pud) -{ - return pud.pud; -} -#else /* PAGETABLE_LEVELS == 3 */ -#include - -static inline pudval_t native_pud_val(pud_t pud) -{ - return native_pgd_val(pud.pgd); -} -#endif /* PAGETABLE_LEVELS == 4 */ - -typedef struct { pmdval_t pmd; } pmd_t; - -static inline pmd_t native_make_pmd(pmdval_t val) -{ - return (pmd_t) { val }; -} - -static inline pmdval_t native_pmd_val(pmd_t pmd) -{ - return pmd.pmd; -} -#else /* PAGETABLE_LEVELS == 2 */ -#include - -static inline pmdval_t native_pmd_val(pmd_t pmd) -{ - return native_pgd_val(pmd.pud.pgd); -} -#endif /* PAGETABLE_LEVELS >= 3 */ - -static inline pte_t native_make_pte(pteval_t val) -{ - return (pte_t) { .pte = val }; -} - -static inline pteval_t native_pte_val(pte_t pte) -{ - return pte.pte; -} - -static inline pteval_t native_pte_flags(pte_t pte) -{ - return native_pte_val(pte) & PTE_FLAGS_MASK; -} - -#define pgprot_val(x) ((x).pgprot) -#define __pgprot(x) ((pgprot_t) { (x) } ) - -#ifdef CONFIG_PARAVIRT -#include -#else /* !CONFIG_PARAVIRT */ - -#define pgd_val(x) native_pgd_val(x) -#define __pgd(x) native_make_pgd(x) - -#ifndef __PAGETABLE_PUD_FOLDED -#define pud_val(x) native_pud_val(x) -#define __pud(x) native_make_pud(x) -#endif - -#ifndef __PAGETABLE_PMD_FOLDED -#define pmd_val(x) native_pmd_val(x) -#define __pmd(x) native_make_pmd(x) -#endif - -#define pte_val(x) native_pte_val(x) -#define pte_flags(x) native_pte_flags(x) -#define __pte(x) native_make_pte(x) - -#endif /* CONFIG_PARAVIRT */ - #define __pa(x) __phys_addr((unsigned long)(x)) #define __pa_nodebug(x) __phys_addr_nodebug((unsigned long)(x)) /* __pa_symbol should be used for C visible symbols. Index: linux-2.6-tip/arch/x86/include/asm/page_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/page_32.h +++ linux-2.6-tip/arch/x86/include/asm/page_32.h @@ -1,82 +1,14 @@ #ifndef _ASM_X86_PAGE_32_H #define _ASM_X86_PAGE_32_H -/* - * This handles the memory map. - * - * A __PAGE_OFFSET of 0xC0000000 means that the kernel has - * a virtual address space of one gigabyte, which limits the - * amount of physical memory you can use to about 950MB. - * - * If you want more physical memory than this then see the CONFIG_HIGHMEM4G - * and CONFIG_HIGHMEM64G options in the kernel configuration. - */ -#define __PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL) - -#ifdef CONFIG_4KSTACKS -#define THREAD_ORDER 0 -#else -#define THREAD_ORDER 1 -#endif -#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) - -#define STACKFAULT_STACK 0 -#define DOUBLEFAULT_STACK 1 -#define NMI_STACK 0 -#define DEBUG_STACK 0 -#define MCE_STACK 0 -#define N_EXCEPTION_STACKS 1 - -#ifdef CONFIG_X86_PAE -/* 44=32+12, the limit we can fit into an unsigned long pfn */ -#define __PHYSICAL_MASK_SHIFT 44 -#define __VIRTUAL_MASK_SHIFT 32 -#define PAGETABLE_LEVELS 3 - -#ifndef __ASSEMBLY__ -typedef u64 pteval_t; -typedef u64 pmdval_t; -typedef u64 pudval_t; -typedef u64 pgdval_t; -typedef u64 pgprotval_t; - -typedef union { - struct { - unsigned long pte_low, pte_high; - }; - pteval_t pte; -} pte_t; -#endif /* __ASSEMBLY__ - */ -#else /* !CONFIG_X86_PAE */ -#define __PHYSICAL_MASK_SHIFT 32 -#define __VIRTUAL_MASK_SHIFT 32 -#define PAGETABLE_LEVELS 2 - -#ifndef __ASSEMBLY__ -typedef unsigned long pteval_t; -typedef unsigned long pmdval_t; -typedef unsigned long pudval_t; -typedef unsigned long pgdval_t; -typedef unsigned long pgprotval_t; - -typedef union { - pteval_t pte; - pteval_t pte_low; -} pte_t; - -#endif /* __ASSEMBLY__ */ -#endif /* CONFIG_X86_PAE */ +#include #ifndef __ASSEMBLY__ -typedef struct page *pgtable_t; -#endif #ifdef CONFIG_HUGETLB_PAGE #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #endif -#ifndef __ASSEMBLY__ #define __phys_addr_nodebug(x) ((x) - PAGE_OFFSET) #ifdef CONFIG_DEBUG_VIRTUAL extern unsigned long __phys_addr(unsigned long); @@ -89,23 +21,6 @@ extern unsigned long __phys_addr(unsigne #define pfn_valid(pfn) ((pfn) < max_mapnr) #endif /* CONFIG_FLATMEM */ -extern int nx_enabled; - -/* - * This much address space is reserved for vmalloc() and iomap() - * as well as fixmap mappings. - */ -extern unsigned int __VMALLOC_RESERVE; -extern int sysctl_legacy_va_layout; - -extern void find_low_pfn_range(void); -extern unsigned long init_memory_mapping(unsigned long start, - unsigned long end); -extern void initmem_init(unsigned long, unsigned long); -extern void free_initmem(void); -extern void setup_bootmem_allocator(void); - - #ifdef CONFIG_X86_USE_3DNOW #include Index: linux-2.6-tip/arch/x86/include/asm/page_32_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/page_32_types.h @@ -0,0 +1,65 @@ +#ifndef _ASM_X86_PAGE_32_DEFS_H +#define _ASM_X86_PAGE_32_DEFS_H + +#include + +/* + * This handles the memory map. + * + * A __PAGE_OFFSET of 0xC0000000 means that the kernel has + * a virtual address space of one gigabyte, which limits the + * amount of physical memory you can use to about 950MB. + * + * If you want more physical memory than this then see the CONFIG_HIGHMEM4G + * and CONFIG_HIGHMEM64G options in the kernel configuration. + */ +#define __PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL) + +#ifdef CONFIG_4KSTACKS +#define THREAD_ORDER 0 +#else +#define THREAD_ORDER 1 +#endif +#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) + +#define STACKFAULT_STACK 0 +#define DOUBLEFAULT_STACK 1 +#define NMI_STACK 0 +#define DEBUG_STACK 0 +#define MCE_STACK 0 +#define N_EXCEPTION_STACKS 1 + +#ifdef CONFIG_X86_PAE +/* 44=32+12, the limit we can fit into an unsigned long pfn */ +#define __PHYSICAL_MASK_SHIFT 44 +#define __VIRTUAL_MASK_SHIFT 32 + +#else /* !CONFIG_X86_PAE */ +#define __PHYSICAL_MASK_SHIFT 32 +#define __VIRTUAL_MASK_SHIFT 32 +#endif /* CONFIG_X86_PAE */ + +/* + * Kernel image size is limited to 512 MB (see in arch/x86/kernel/head_32.S) + */ +#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) + +#ifndef __ASSEMBLY__ + +/* + * This much address space is reserved for vmalloc() and iomap() + * as well as fixmap mappings. + */ +extern unsigned int __VMALLOC_RESERVE; +extern int sysctl_legacy_va_layout; + +extern void find_low_pfn_range(void); +extern unsigned long init_memory_mapping(unsigned long start, + unsigned long end); +extern void initmem_init(unsigned long, unsigned long); +extern void free_initmem(void); +extern void setup_bootmem_allocator(void); + +#endif /* !__ASSEMBLY__ */ + +#endif /* _ASM_X86_PAGE_32_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/page_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/page_64.h +++ linux-2.6-tip/arch/x86/include/asm/page_64.h @@ -1,105 +1,6 @@ #ifndef _ASM_X86_PAGE_64_H #define _ASM_X86_PAGE_64_H -#define PAGETABLE_LEVELS 4 - -#define THREAD_ORDER 1 -#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) -#define CURRENT_MASK (~(THREAD_SIZE - 1)) - -#define EXCEPTION_STACK_ORDER 0 -#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER) - -#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1) -#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER) - -#define IRQSTACK_ORDER 2 -#define IRQSTACKSIZE (PAGE_SIZE << IRQSTACK_ORDER) - -#define STACKFAULT_STACK 1 -#define DOUBLEFAULT_STACK 2 -#define NMI_STACK 3 -#define DEBUG_STACK 4 -#define MCE_STACK 5 -#define N_EXCEPTION_STACKS 5 /* hw limit: 7 */ - -#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT) -#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1)) - -/* - * Set __PAGE_OFFSET to the most negative possible address + - * PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a - * hypervisor to fit. Choosing 16 slots here is arbitrary, but it's - * what Xen requires. - */ -#define __PAGE_OFFSET _AC(0xffff880000000000, UL) - -#define __PHYSICAL_START CONFIG_PHYSICAL_START -#define __KERNEL_ALIGN 0x200000 - -/* - * Make sure kernel is aligned to 2MB address. Catching it at compile - * time is better. Change your config file and compile the kernel - * for a 2MB aligned address (CONFIG_PHYSICAL_START) - */ -#if (CONFIG_PHYSICAL_START % __KERNEL_ALIGN) != 0 -#error "CONFIG_PHYSICAL_START must be a multiple of 2MB" -#endif - -#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START) -#define __START_KERNEL_map _AC(0xffffffff80000000, UL) - -/* See Documentation/x86_64/mm.txt for a description of the memory map. */ -#define __PHYSICAL_MASK_SHIFT 46 -#define __VIRTUAL_MASK_SHIFT 48 - -/* - * Kernel image size is limited to 512 MB (see level2_kernel_pgt in - * arch/x86/kernel/head_64.S), and it is mapped here: - */ -#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) -#define KERNEL_IMAGE_START _AC(0xffffffff80000000, UL) - -#ifndef __ASSEMBLY__ -void clear_page(void *page); -void copy_page(void *to, void *from); - -/* duplicated to the one in bootmem.h */ -extern unsigned long max_pfn; -extern unsigned long phys_base; - -extern unsigned long __phys_addr(unsigned long); -#define __phys_reloc_hide(x) (x) - -/* - * These are used to make use of C type-checking.. - */ -typedef unsigned long pteval_t; -typedef unsigned long pmdval_t; -typedef unsigned long pudval_t; -typedef unsigned long pgdval_t; -typedef unsigned long pgprotval_t; - -typedef struct page *pgtable_t; - -typedef struct { pteval_t pte; } pte_t; - -#define vmemmap ((struct page *)VMEMMAP_START) - -extern unsigned long init_memory_mapping(unsigned long start, - unsigned long end); - -extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn); -extern void free_initmem(void); - -extern void init_extra_mapping_uc(unsigned long phys, unsigned long size); -extern void init_extra_mapping_wb(unsigned long phys, unsigned long size); - -#endif /* !__ASSEMBLY__ */ - -#ifdef CONFIG_FLATMEM -#define pfn_valid(pfn) ((pfn) < max_pfn) -#endif - +#include #endif /* _ASM_X86_PAGE_64_H */ Index: linux-2.6-tip/arch/x86/include/asm/page_64_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/page_64_types.h @@ -0,0 +1,98 @@ +#ifndef _ASM_X86_PAGE_64_DEFS_H +#define _ASM_X86_PAGE_64_DEFS_H + +#define THREAD_ORDER 1 +#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) +#define CURRENT_MASK (~(THREAD_SIZE - 1)) + +#define EXCEPTION_STACK_ORDER 0 +#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER) + +#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1) +#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER) + +#define IRQ_STACK_ORDER 2 +#define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER) + +#ifdef CONFIG_PREEMPT_RT +# define STACKFAULT_STACK 0 +# define DOUBLEFAULT_STACK 1 +# define NMI_STACK 2 +# define DEBUG_STACK 0 +# define MCE_STACK 3 +# define N_EXCEPTION_STACKS 3 /* hw limit: 7 */ +#else +# define STACKFAULT_STACK 1 +# define DOUBLEFAULT_STACK 2 +# define NMI_STACK 3 +# define DEBUG_STACK 4 +# define MCE_STACK 5 +# define N_EXCEPTION_STACKS 5 /* hw limit: 7 */ +#endif + +#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT) +#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1)) + +/* + * Set __PAGE_OFFSET to the most negative possible address + + * PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a + * hypervisor to fit. Choosing 16 slots here is arbitrary, but it's + * what Xen requires. + */ +#define __PAGE_OFFSET _AC(0xffff880000000000, UL) + +#define __PHYSICAL_START CONFIG_PHYSICAL_START +#define __KERNEL_ALIGN 0x200000 + +/* + * Make sure kernel is aligned to 2MB address. Catching it at compile + * time is better. Change your config file and compile the kernel + * for a 2MB aligned address (CONFIG_PHYSICAL_START) + */ +#if (CONFIG_PHYSICAL_START % __KERNEL_ALIGN) != 0 +#error "CONFIG_PHYSICAL_START must be a multiple of 2MB" +#endif + +#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START) +#define __START_KERNEL_map _AC(0xffffffff80000000, UL) + +/* See Documentation/x86_64/mm.txt for a description of the memory map. */ +#define __PHYSICAL_MASK_SHIFT 46 +#define __VIRTUAL_MASK_SHIFT 48 + +/* + * Kernel image size is limited to 512 MB (see level2_kernel_pgt in + * arch/x86/kernel/head_64.S), and it is mapped here: + */ +#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) +#define KERNEL_IMAGE_START _AC(0xffffffff80000000, UL) + +#ifndef __ASSEMBLY__ +void clear_page(void *page); +void copy_page(void *to, void *from); + +/* duplicated to the one in bootmem.h */ +extern unsigned long max_pfn; +extern unsigned long phys_base; + +extern unsigned long __phys_addr(unsigned long); +#define __phys_reloc_hide(x) (x) + +#define vmemmap ((struct page *)VMEMMAP_START) + +extern unsigned long init_memory_mapping(unsigned long start, + unsigned long end); + +extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn); +extern void free_initmem(void); + +extern void init_extra_mapping_uc(unsigned long phys, unsigned long size); +extern void init_extra_mapping_wb(unsigned long phys, unsigned long size); + +#endif /* !__ASSEMBLY__ */ + +#ifdef CONFIG_FLATMEM +#define pfn_valid(pfn) ((pfn) < max_pfn) +#endif + +#endif /* _ASM_X86_PAGE_64_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/page_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/page_types.h @@ -0,0 +1,59 @@ +#ifndef _ASM_X86_PAGE_DEFS_H +#define _ASM_X86_PAGE_DEFS_H + +#include + +/* PAGE_SHIFT determines the page size */ +#define PAGE_SHIFT 12 +#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) +#define PAGE_MASK (~(PAGE_SIZE-1)) + +#define __PHYSICAL_MASK ((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1) +#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1) + +/* Cast PAGE_MASK to a signed type so that it is sign-extended if + virtual addresses are 32-bits but physical addresses are larger + (ie, 32-bit PAE). */ +#define PHYSICAL_PAGE_MASK (((signed long)PAGE_MASK) & __PHYSICAL_MASK) + +#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT) +#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1)) + +#define HPAGE_SHIFT PMD_SHIFT +#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +#define HUGE_MAX_HSTATE 2 + +#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET) + +#define VM_DATA_DEFAULT_FLAGS \ + (((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \ + VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#ifdef CONFIG_X86_64 +#include +#else +#include +#endif /* CONFIG_X86_64 */ + +#ifndef __ASSEMBLY__ + +enum bootmem_state { + BEFORE_BOOTMEM, + DURING_BOOTMEM, + AFTER_BOOTMEM +}; + +extern enum bootmem_state bootmem_state; + +extern int page_is_ram(unsigned long pagenr); +extern int devmem_is_allowed(unsigned long pagenr); + +extern unsigned long max_low_pfn_mapped; +extern unsigned long max_pfn_mapped; + +#endif /* !__ASSEMBLY__ */ + +#endif /* _ASM_X86_PAGE_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/paravirt.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/paravirt.h +++ linux-2.6-tip/arch/x86/include/asm/paravirt.h @@ -4,7 +4,7 @@ * para-virtualization: those hooks are defined here. */ #ifdef CONFIG_PARAVIRT -#include +#include #include /* Bitmask of what can be clobbered: usually at least eax. */ @@ -12,21 +12,38 @@ #define CLBR_EAX (1 << 0) #define CLBR_ECX (1 << 1) #define CLBR_EDX (1 << 2) +#define CLBR_EDI (1 << 3) -#ifdef CONFIG_X86_64 -#define CLBR_RSI (1 << 3) -#define CLBR_RDI (1 << 4) +#ifdef CONFIG_X86_32 +/* CLBR_ANY should match all regs platform has. For i386, that's just it */ +#define CLBR_ANY ((1 << 4) - 1) + +#define CLBR_ARG_REGS (CLBR_EAX | CLBR_EDX | CLBR_ECX) +#define CLBR_RET_REG (CLBR_EAX | CLBR_EDX) +#define CLBR_SCRATCH (0) +#else +#define CLBR_RAX CLBR_EAX +#define CLBR_RCX CLBR_ECX +#define CLBR_RDX CLBR_EDX +#define CLBR_RDI CLBR_EDI +#define CLBR_RSI (1 << 4) #define CLBR_R8 (1 << 5) #define CLBR_R9 (1 << 6) #define CLBR_R10 (1 << 7) #define CLBR_R11 (1 << 8) + #define CLBR_ANY ((1 << 9) - 1) + +#define CLBR_ARG_REGS (CLBR_RDI | CLBR_RSI | CLBR_RDX | \ + CLBR_RCX | CLBR_R8 | CLBR_R9) +#define CLBR_RET_REG (CLBR_RAX) +#define CLBR_SCRATCH (CLBR_R10 | CLBR_R11) + #include -#else -/* CLBR_ANY should match all regs platform has. For i386, that's just it */ -#define CLBR_ANY ((1 << 3) - 1) #endif /* X86_64 */ +#define CLBR_CALLEE_SAVE ((CLBR_ARG_REGS | CLBR_SCRATCH) & ~CLBR_RET_REG) + #ifndef __ASSEMBLY__ #include #include @@ -40,6 +57,14 @@ struct tss_struct; struct mm_struct; struct desc_struct; +/* + * Wrapper type for pointers to code which uses the non-standard + * calling convention. See PV_CALL_SAVE_REGS_THUNK below. + */ +struct paravirt_callee_save { + void *func; +}; + /* general info */ struct pv_info { unsigned int kernel_rpl; @@ -189,11 +214,15 @@ struct pv_irq_ops { * expected to use X86_EFLAGS_IF; all other bits * returned from save_fl are undefined, and may be ignored by * restore_fl. + * + * NOTE: These functions callers expect the callee to preserve + * more registers than the standard C calling convention. */ - unsigned long (*save_fl)(void); - void (*restore_fl)(unsigned long); - void (*irq_disable)(void); - void (*irq_enable)(void); + struct paravirt_callee_save save_fl; + struct paravirt_callee_save restore_fl; + struct paravirt_callee_save irq_disable; + struct paravirt_callee_save irq_enable; + void (*safe_halt)(void); void (*halt)(void); @@ -244,7 +273,8 @@ struct pv_mmu_ops { void (*flush_tlb_user)(void); void (*flush_tlb_kernel)(void); void (*flush_tlb_single)(unsigned long addr); - void (*flush_tlb_others)(const cpumask_t *cpus, struct mm_struct *mm, + void (*flush_tlb_others)(const struct cpumask *cpus, + struct mm_struct *mm, unsigned long va); /* Hooks for allocating and freeing a pagetable top-level */ @@ -278,18 +308,15 @@ struct pv_mmu_ops { void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte); - pteval_t (*pte_val)(pte_t); - pteval_t (*pte_flags)(pte_t); - pte_t (*make_pte)(pteval_t pte); + struct paravirt_callee_save pte_val; + struct paravirt_callee_save make_pte; - pgdval_t (*pgd_val)(pgd_t); - pgd_t (*make_pgd)(pgdval_t pgd); + struct paravirt_callee_save pgd_val; + struct paravirt_callee_save make_pgd; #if PAGETABLE_LEVELS >= 3 #ifdef CONFIG_X86_PAE void (*set_pte_atomic)(pte_t *ptep, pte_t pteval); - void (*set_pte_present)(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte); void (*pte_clear)(struct mm_struct *mm, unsigned long addr, pte_t *ptep); void (*pmd_clear)(pmd_t *pmdp); @@ -298,12 +325,12 @@ struct pv_mmu_ops { void (*set_pud)(pud_t *pudp, pud_t pudval); - pmdval_t (*pmd_val)(pmd_t); - pmd_t (*make_pmd)(pmdval_t pmd); + struct paravirt_callee_save pmd_val; + struct paravirt_callee_save make_pmd; #if PAGETABLE_LEVELS == 4 - pudval_t (*pud_val)(pud_t); - pud_t (*make_pud)(pudval_t pud); + struct paravirt_callee_save pud_val; + struct paravirt_callee_save make_pud; void (*set_pgd)(pgd_t *pudp, pgd_t pgdval); #endif /* PAGETABLE_LEVELS == 4 */ @@ -311,6 +338,7 @@ struct pv_mmu_ops { #ifdef CONFIG_HIGHPTE void *(*kmap_atomic_pte)(struct page *page, enum km_type type); + void *(*kmap_atomic_pte_direct)(struct page *page, enum km_type type); #endif struct pv_lazy_ops lazy_mode; @@ -320,7 +348,7 @@ struct pv_mmu_ops { /* Sometimes the physical address is a pfn, and sometimes its an mfn. We can tell which is which from the index. */ void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx, - unsigned long phys, pgprot_t flags); + phys_addr_t phys, pgprot_t flags); }; struct raw_spinlock; @@ -360,7 +388,7 @@ extern struct pv_lock_ops pv_lock_ops; #define paravirt_type(op) \ [paravirt_typenum] "i" (PARAVIRT_PATCH(op)), \ - [paravirt_opptr] "m" (op) + [paravirt_opptr] "i" (&(op)) #define paravirt_clobber(clobber) \ [paravirt_clobber] "i" (clobber) @@ -388,6 +416,8 @@ extern struct pv_lock_ops pv_lock_ops; asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":") unsigned paravirt_patch_nop(void); +unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len); +unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len); unsigned paravirt_patch_ignore(unsigned len); unsigned paravirt_patch_call(void *insnbuf, const void *target, u16 tgt_clobbers, @@ -412,7 +442,7 @@ int paravirt_disable_iospace(void); * offset into the paravirt_patch_template structure, and can therefore be * freely converted back into a structure offset. */ -#define PARAVIRT_CALL "call *%[paravirt_opptr];" +#define PARAVIRT_CALL "call *%c[paravirt_opptr];" /* * These macros are intended to wrap calls through one of the paravirt @@ -479,25 +509,45 @@ int paravirt_disable_iospace(void); * makes sure the incoming and outgoing types are always correct. */ #ifdef CONFIG_X86_32 -#define PVOP_VCALL_ARGS unsigned long __eax, __edx, __ecx +#define PVOP_VCALL_ARGS \ + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx #define PVOP_CALL_ARGS PVOP_VCALL_ARGS + +#define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x)) +#define PVOP_CALL_ARG2(x) "d" ((unsigned long)(x)) +#define PVOP_CALL_ARG3(x) "c" ((unsigned long)(x)) + #define PVOP_VCALL_CLOBBERS "=a" (__eax), "=d" (__edx), \ "=c" (__ecx) #define PVOP_CALL_CLOBBERS PVOP_VCALL_CLOBBERS + +#define PVOP_VCALLEE_CLOBBERS "=a" (__eax), "=d" (__edx) +#define PVOP_CALLEE_CLOBBERS PVOP_VCALLEE_CLOBBERS + #define EXTRA_CLOBBERS #define VEXTRA_CLOBBERS -#else -#define PVOP_VCALL_ARGS unsigned long __edi, __esi, __edx, __ecx +#else /* CONFIG_X86_64 */ +#define PVOP_VCALL_ARGS \ + unsigned long __edi = __edi, __esi = __esi, \ + __edx = __edx, __ecx = __ecx #define PVOP_CALL_ARGS PVOP_VCALL_ARGS, __eax + +#define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x)) +#define PVOP_CALL_ARG2(x) "S" ((unsigned long)(x)) +#define PVOP_CALL_ARG3(x) "d" ((unsigned long)(x)) +#define PVOP_CALL_ARG4(x) "c" ((unsigned long)(x)) + #define PVOP_VCALL_CLOBBERS "=D" (__edi), \ "=S" (__esi), "=d" (__edx), \ "=c" (__ecx) - #define PVOP_CALL_CLOBBERS PVOP_VCALL_CLOBBERS, "=a" (__eax) +#define PVOP_VCALLEE_CLOBBERS "=a" (__eax) +#define PVOP_CALLEE_CLOBBERS PVOP_VCALLEE_CLOBBERS + #define EXTRA_CLOBBERS , "r8", "r9", "r10", "r11" #define VEXTRA_CLOBBERS , "rax", "r8", "r9", "r10", "r11" -#endif +#endif /* CONFIG_X86_32 */ #ifdef CONFIG_PARAVIRT_DEBUG #define PVOP_TEST_NULL(op) BUG_ON(op == NULL) @@ -505,10 +555,11 @@ int paravirt_disable_iospace(void); #define PVOP_TEST_NULL(op) ((void)op) #endif -#define __PVOP_CALL(rettype, op, pre, post, ...) \ +#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, \ + pre, post, ...) \ ({ \ rettype __ret; \ - PVOP_CALL_ARGS; \ + PVOP_CALL_ARGS; \ PVOP_TEST_NULL(op); \ /* This is 32-bit specific, but is okay in 64-bit */ \ /* since this condition will never hold */ \ @@ -516,70 +567,113 @@ int paravirt_disable_iospace(void); asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : PVOP_CALL_CLOBBERS \ + : call_clbr \ : paravirt_type(op), \ - paravirt_clobber(CLBR_ANY), \ + paravirt_clobber(clbr), \ ##__VA_ARGS__ \ - : "memory", "cc" EXTRA_CLOBBERS); \ + : "memory", "cc" extra_clbr); \ __ret = (rettype)((((u64)__edx) << 32) | __eax); \ } else { \ asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : PVOP_CALL_CLOBBERS \ + : call_clbr \ : paravirt_type(op), \ - paravirt_clobber(CLBR_ANY), \ + paravirt_clobber(clbr), \ ##__VA_ARGS__ \ - : "memory", "cc" EXTRA_CLOBBERS); \ + : "memory", "cc" extra_clbr); \ __ret = (rettype)__eax; \ } \ __ret; \ }) -#define __PVOP_VCALL(op, pre, post, ...) \ + +#define __PVOP_CALL(rettype, op, pre, post, ...) \ + ____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS, \ + EXTRA_CLOBBERS, pre, post, ##__VA_ARGS__) + +#define __PVOP_CALLEESAVE(rettype, op, pre, post, ...) \ + ____PVOP_CALL(rettype, op.func, CLBR_RET_REG, \ + PVOP_CALLEE_CLOBBERS, , \ + pre, post, ##__VA_ARGS__) + + +#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, pre, post, ...) \ ({ \ PVOP_VCALL_ARGS; \ PVOP_TEST_NULL(op); \ asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : PVOP_VCALL_CLOBBERS \ + : call_clbr \ : paravirt_type(op), \ - paravirt_clobber(CLBR_ANY), \ + paravirt_clobber(clbr), \ ##__VA_ARGS__ \ - : "memory", "cc" VEXTRA_CLOBBERS); \ + : "memory", "cc" extra_clbr); \ }) +#define __PVOP_VCALL(op, pre, post, ...) \ + ____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS, \ + VEXTRA_CLOBBERS, \ + pre, post, ##__VA_ARGS__) + +#define __PVOP_VCALLEESAVE(rettype, op, pre, post, ...) \ + ____PVOP_CALL(rettype, op.func, CLBR_RET_REG, \ + PVOP_VCALLEE_CLOBBERS, , \ + pre, post, ##__VA_ARGS__) + + + #define PVOP_CALL0(rettype, op) \ __PVOP_CALL(rettype, op, "", "") #define PVOP_VCALL0(op) \ __PVOP_VCALL(op, "", "") +#define PVOP_CALLEE0(rettype, op) \ + __PVOP_CALLEESAVE(rettype, op, "", "") +#define PVOP_VCALLEE0(op) \ + __PVOP_VCALLEESAVE(op, "", "") + + #define PVOP_CALL1(rettype, op, arg1) \ - __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1))) + __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) #define PVOP_VCALL1(op, arg1) \ - __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1))) + __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1)) + +#define PVOP_CALLEE1(rettype, op, arg1) \ + __PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) +#define PVOP_VCALLEE1(op, arg1) \ + __PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1)) + #define PVOP_CALL2(rettype, op, arg1, arg2) \ - __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \ - "1" ((unsigned long)(arg2))) + __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2)) #define PVOP_VCALL2(op, arg1, arg2) \ - __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \ - "1" ((unsigned long)(arg2))) + __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2)) + +#define PVOP_CALLEE2(rettype, op, arg1, arg2) \ + __PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2)) +#define PVOP_VCALLEE2(op, arg1, arg2) \ + __PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2)) + #define PVOP_CALL3(rettype, op, arg1, arg2, arg3) \ - __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \ - "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3))) + __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3)) #define PVOP_VCALL3(op, arg1, arg2, arg3) \ - __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \ - "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3))) + __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1), \ + PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3)) /* This is the only difference in x86_64. We can make it much simpler */ #ifdef CONFIG_X86_32 #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ __PVOP_CALL(rettype, op, \ "push %[_arg4];", "lea 4(%%esp),%%esp;", \ - "0" ((u32)(arg1)), "1" ((u32)(arg2)), \ - "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4))) + PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ + PVOP_CALL_ARG3(arg3), [_arg4] "mr" ((u32)(arg4))) #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ __PVOP_VCALL(op, \ "push %[_arg4];", "lea 4(%%esp),%%esp;", \ @@ -587,13 +681,13 @@ int paravirt_disable_iospace(void); "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4))) #else #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ - __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \ - "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)), \ - "3"((unsigned long)(arg4))) + __PVOP_CALL(rettype, op, "", "", \ + PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ + PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4)) #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ - __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \ - "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)), \ - "3"((unsigned long)(arg4))) + __PVOP_VCALL(op, "", "", \ + PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ + PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4)) #endif static inline int paravirt_enabled(void) @@ -984,10 +1078,11 @@ static inline void __flush_tlb_single(un PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr); } -static inline void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm, +static inline void flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, unsigned long va) { - PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, &cpumask, mm, va); + PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, cpumask, mm, va); } static inline int paravirt_pgd_alloc(struct mm_struct *mm) @@ -1040,6 +1135,14 @@ static inline void *kmap_atomic_pte(stru ret = PVOP_CALL2(unsigned long, pv_mmu_ops.kmap_atomic_pte, page, type); return (void *)ret; } + +static inline void *kmap_atomic_pte_direct(struct page *page, enum km_type type) +{ + unsigned long ret; + ret = PVOP_CALL2(unsigned long, pv_mmu_ops.kmap_atomic_pte_direct, + page, type); + return (void *)ret; +} #endif static inline void pte_update(struct mm_struct *mm, unsigned long addr, @@ -1059,13 +1162,13 @@ static inline pte_t __pte(pteval_t val) pteval_t ret; if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALL2(pteval_t, - pv_mmu_ops.make_pte, - val, (u64)val >> 32); + ret = PVOP_CALLEE2(pteval_t, + pv_mmu_ops.make_pte, + val, (u64)val >> 32); else - ret = PVOP_CALL1(pteval_t, - pv_mmu_ops.make_pte, - val); + ret = PVOP_CALLEE1(pteval_t, + pv_mmu_ops.make_pte, + val); return (pte_t) { .pte = ret }; } @@ -1075,42 +1178,25 @@ static inline pteval_t pte_val(pte_t pte pteval_t ret; if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALL2(pteval_t, pv_mmu_ops.pte_val, - pte.pte, (u64)pte.pte >> 32); + ret = PVOP_CALLEE2(pteval_t, pv_mmu_ops.pte_val, + pte.pte, (u64)pte.pte >> 32); else - ret = PVOP_CALL1(pteval_t, pv_mmu_ops.pte_val, - pte.pte); + ret = PVOP_CALLEE1(pteval_t, pv_mmu_ops.pte_val, + pte.pte); return ret; } -static inline pteval_t pte_flags(pte_t pte) -{ - pteval_t ret; - - if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALL2(pteval_t, pv_mmu_ops.pte_flags, - pte.pte, (u64)pte.pte >> 32); - else - ret = PVOP_CALL1(pteval_t, pv_mmu_ops.pte_flags, - pte.pte); - -#ifdef CONFIG_PARAVIRT_DEBUG - BUG_ON(ret & PTE_PFN_MASK); -#endif - return ret; -} - static inline pgd_t __pgd(pgdval_t val) { pgdval_t ret; if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALL2(pgdval_t, pv_mmu_ops.make_pgd, - val, (u64)val >> 32); + ret = PVOP_CALLEE2(pgdval_t, pv_mmu_ops.make_pgd, + val, (u64)val >> 32); else - ret = PVOP_CALL1(pgdval_t, pv_mmu_ops.make_pgd, - val); + ret = PVOP_CALLEE1(pgdval_t, pv_mmu_ops.make_pgd, + val); return (pgd_t) { ret }; } @@ -1120,11 +1206,11 @@ static inline pgdval_t pgd_val(pgd_t pgd pgdval_t ret; if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALL2(pgdval_t, pv_mmu_ops.pgd_val, - pgd.pgd, (u64)pgd.pgd >> 32); + ret = PVOP_CALLEE2(pgdval_t, pv_mmu_ops.pgd_val, + pgd.pgd, (u64)pgd.pgd >> 32); else - ret = PVOP_CALL1(pgdval_t, pv_mmu_ops.pgd_val, - pgd.pgd); + ret = PVOP_CALLEE1(pgdval_t, pv_mmu_ops.pgd_val, + pgd.pgd); return ret; } @@ -1188,11 +1274,11 @@ static inline pmd_t __pmd(pmdval_t val) pmdval_t ret; if (sizeof(pmdval_t) > sizeof(long)) - ret = PVOP_CALL2(pmdval_t, pv_mmu_ops.make_pmd, - val, (u64)val >> 32); + ret = PVOP_CALLEE2(pmdval_t, pv_mmu_ops.make_pmd, + val, (u64)val >> 32); else - ret = PVOP_CALL1(pmdval_t, pv_mmu_ops.make_pmd, - val); + ret = PVOP_CALLEE1(pmdval_t, pv_mmu_ops.make_pmd, + val); return (pmd_t) { ret }; } @@ -1202,11 +1288,11 @@ static inline pmdval_t pmd_val(pmd_t pmd pmdval_t ret; if (sizeof(pmdval_t) > sizeof(long)) - ret = PVOP_CALL2(pmdval_t, pv_mmu_ops.pmd_val, - pmd.pmd, (u64)pmd.pmd >> 32); + ret = PVOP_CALLEE2(pmdval_t, pv_mmu_ops.pmd_val, + pmd.pmd, (u64)pmd.pmd >> 32); else - ret = PVOP_CALL1(pmdval_t, pv_mmu_ops.pmd_val, - pmd.pmd); + ret = PVOP_CALLEE1(pmdval_t, pv_mmu_ops.pmd_val, + pmd.pmd); return ret; } @@ -1228,11 +1314,11 @@ static inline pud_t __pud(pudval_t val) pudval_t ret; if (sizeof(pudval_t) > sizeof(long)) - ret = PVOP_CALL2(pudval_t, pv_mmu_ops.make_pud, - val, (u64)val >> 32); + ret = PVOP_CALLEE2(pudval_t, pv_mmu_ops.make_pud, + val, (u64)val >> 32); else - ret = PVOP_CALL1(pudval_t, pv_mmu_ops.make_pud, - val); + ret = PVOP_CALLEE1(pudval_t, pv_mmu_ops.make_pud, + val); return (pud_t) { ret }; } @@ -1242,11 +1328,11 @@ static inline pudval_t pud_val(pud_t pud pudval_t ret; if (sizeof(pudval_t) > sizeof(long)) - ret = PVOP_CALL2(pudval_t, pv_mmu_ops.pud_val, - pud.pud, (u64)pud.pud >> 32); + ret = PVOP_CALLEE2(pudval_t, pv_mmu_ops.pud_val, + pud.pud, (u64)pud.pud >> 32); else - ret = PVOP_CALL1(pudval_t, pv_mmu_ops.pud_val, - pud.pud); + ret = PVOP_CALLEE1(pudval_t, pv_mmu_ops.pud_val, + pud.pud); return ret; } @@ -1286,13 +1372,6 @@ static inline void set_pte_atomic(pte_t pte.pte, pte.pte >> 32); } -static inline void set_pte_present(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ - /* 5 arg words */ - pv_mmu_ops.set_pte_present(mm, addr, ptep, pte); -} - static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { @@ -1309,12 +1388,6 @@ static inline void set_pte_atomic(pte_t set_pte(ptep, pte); } -static inline void set_pte_present(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ - set_pte(ptep, pte); -} - static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { @@ -1368,15 +1441,16 @@ static inline void arch_leave_lazy_mmu_m void arch_flush_lazy_mmu_mode(void); static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, - unsigned long phys, pgprot_t flags) + phys_addr_t phys, pgprot_t flags) { pv_mmu_ops.set_fixmap(idx, phys, flags); } void _paravirt_nop(void); -#define paravirt_nop ((void *)_paravirt_nop) +u32 _paravirt_ident_32(u32); +u64 _paravirt_ident_64(u64); -void paravirt_use_bytelocks(void); +#define paravirt_nop ((void *)_paravirt_nop) #ifdef CONFIG_SMP @@ -1426,12 +1500,37 @@ extern struct paravirt_patch_site __para __parainstructions_end[]; #ifdef CONFIG_X86_32 -#define PV_SAVE_REGS "pushl %%ecx; pushl %%edx;" -#define PV_RESTORE_REGS "popl %%edx; popl %%ecx" +#define PV_SAVE_REGS "pushl %ecx; pushl %edx;" +#define PV_RESTORE_REGS "popl %edx; popl %ecx;" + +/* save and restore all caller-save registers, except return value */ +#define PV_SAVE_ALL_CALLER_REGS "pushl %ecx;" +#define PV_RESTORE_ALL_CALLER_REGS "popl %ecx;" + #define PV_FLAGS_ARG "0" #define PV_EXTRA_CLOBBERS #define PV_VEXTRA_CLOBBERS #else +/* save and restore all caller-save registers, except return value */ +#define PV_SAVE_ALL_CALLER_REGS \ + "push %rcx;" \ + "push %rdx;" \ + "push %rsi;" \ + "push %rdi;" \ + "push %r8;" \ + "push %r9;" \ + "push %r10;" \ + "push %r11;" +#define PV_RESTORE_ALL_CALLER_REGS \ + "pop %r11;" \ + "pop %r10;" \ + "pop %r9;" \ + "pop %r8;" \ + "pop %rdi;" \ + "pop %rsi;" \ + "pop %rdx;" \ + "pop %rcx;" + /* We save some registers, but all of them, that's too much. We clobber all * caller saved registers but the argument parameter */ #define PV_SAVE_REGS "pushq %%rdi;" @@ -1441,52 +1540,76 @@ extern struct paravirt_patch_site __para #define PV_FLAGS_ARG "D" #endif +/* + * Generate a thunk around a function which saves all caller-save + * registers except for the return value. This allows C functions to + * be called from assembler code where fewer than normal registers are + * available. It may also help code generation around calls from C + * code if the common case doesn't use many registers. + * + * When a callee is wrapped in a thunk, the caller can assume that all + * arg regs and all scratch registers are preserved across the + * call. The return value in rax/eax will not be saved, even for void + * functions. + */ +#define PV_CALLEE_SAVE_REGS_THUNK(func) \ + extern typeof(func) __raw_callee_save_##func; \ + static void *__##func##__ __used = func; \ + \ + asm(".pushsection .text;" \ + "__raw_callee_save_" #func ": " \ + PV_SAVE_ALL_CALLER_REGS \ + "call " #func ";" \ + PV_RESTORE_ALL_CALLER_REGS \ + "ret;" \ + ".popsection") + +/* Get a reference to a callee-save function */ +#define PV_CALLEE_SAVE(func) \ + ((struct paravirt_callee_save) { __raw_callee_save_##func }) + +/* Promise that "func" already uses the right calling convention */ +#define __PV_IS_CALLEE_SAVE(func) \ + ((struct paravirt_callee_save) { func }) + static inline unsigned long __raw_local_save_flags(void) { unsigned long f; - asm volatile(paravirt_alt(PV_SAVE_REGS - PARAVIRT_CALL - PV_RESTORE_REGS) + asm volatile(paravirt_alt(PARAVIRT_CALL) : "=a"(f) : paravirt_type(pv_irq_ops.save_fl), paravirt_clobber(CLBR_EAX) - : "memory", "cc" PV_VEXTRA_CLOBBERS); + : "memory", "cc"); return f; } static inline void raw_local_irq_restore(unsigned long f) { - asm volatile(paravirt_alt(PV_SAVE_REGS - PARAVIRT_CALL - PV_RESTORE_REGS) + asm volatile(paravirt_alt(PARAVIRT_CALL) : "=a"(f) : PV_FLAGS_ARG(f), paravirt_type(pv_irq_ops.restore_fl), paravirt_clobber(CLBR_EAX) - : "memory", "cc" PV_EXTRA_CLOBBERS); + : "memory", "cc"); } static inline void raw_local_irq_disable(void) { - asm volatile(paravirt_alt(PV_SAVE_REGS - PARAVIRT_CALL - PV_RESTORE_REGS) + asm volatile(paravirt_alt(PARAVIRT_CALL) : : paravirt_type(pv_irq_ops.irq_disable), paravirt_clobber(CLBR_EAX) - : "memory", "eax", "cc" PV_EXTRA_CLOBBERS); + : "memory", "eax", "cc"); } static inline void raw_local_irq_enable(void) { - asm volatile(paravirt_alt(PV_SAVE_REGS - PARAVIRT_CALL - PV_RESTORE_REGS) + asm volatile(paravirt_alt(PARAVIRT_CALL) : : paravirt_type(pv_irq_ops.irq_enable), paravirt_clobber(CLBR_EAX) - : "memory", "eax", "cc" PV_EXTRA_CLOBBERS); + : "memory", "eax", "cc"); } static inline unsigned long __raw_local_irq_save(void) @@ -1529,33 +1652,49 @@ static inline unsigned long __raw_local_ .popsection +#define COND_PUSH(set, mask, reg) \ + .if ((~(set)) & mask); push %reg; .endif +#define COND_POP(set, mask, reg) \ + .if ((~(set)) & mask); pop %reg; .endif + #ifdef CONFIG_X86_64 -#define PV_SAVE_REGS \ - push %rax; \ - push %rcx; \ - push %rdx; \ - push %rsi; \ - push %rdi; \ - push %r8; \ - push %r9; \ - push %r10; \ - push %r11 -#define PV_RESTORE_REGS \ - pop %r11; \ - pop %r10; \ - pop %r9; \ - pop %r8; \ - pop %rdi; \ - pop %rsi; \ - pop %rdx; \ - pop %rcx; \ - pop %rax + +#define PV_SAVE_REGS(set) \ + COND_PUSH(set, CLBR_RAX, rax); \ + COND_PUSH(set, CLBR_RCX, rcx); \ + COND_PUSH(set, CLBR_RDX, rdx); \ + COND_PUSH(set, CLBR_RSI, rsi); \ + COND_PUSH(set, CLBR_RDI, rdi); \ + COND_PUSH(set, CLBR_R8, r8); \ + COND_PUSH(set, CLBR_R9, r9); \ + COND_PUSH(set, CLBR_R10, r10); \ + COND_PUSH(set, CLBR_R11, r11) +#define PV_RESTORE_REGS(set) \ + COND_POP(set, CLBR_R11, r11); \ + COND_POP(set, CLBR_R10, r10); \ + COND_POP(set, CLBR_R9, r9); \ + COND_POP(set, CLBR_R8, r8); \ + COND_POP(set, CLBR_RDI, rdi); \ + COND_POP(set, CLBR_RSI, rsi); \ + COND_POP(set, CLBR_RDX, rdx); \ + COND_POP(set, CLBR_RCX, rcx); \ + COND_POP(set, CLBR_RAX, rax) + #define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 8) #define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .quad, 8) #define PARA_INDIRECT(addr) *addr(%rip) #else -#define PV_SAVE_REGS pushl %eax; pushl %edi; pushl %ecx; pushl %edx -#define PV_RESTORE_REGS popl %edx; popl %ecx; popl %edi; popl %eax +#define PV_SAVE_REGS(set) \ + COND_PUSH(set, CLBR_EAX, eax); \ + COND_PUSH(set, CLBR_EDI, edi); \ + COND_PUSH(set, CLBR_ECX, ecx); \ + COND_PUSH(set, CLBR_EDX, edx) +#define PV_RESTORE_REGS(set) \ + COND_POP(set, CLBR_EDX, edx); \ + COND_POP(set, CLBR_ECX, ecx); \ + COND_POP(set, CLBR_EDI, edi); \ + COND_POP(set, CLBR_EAX, eax) + #define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4) #define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .long, 4) #define PARA_INDIRECT(addr) *%cs:addr @@ -1567,15 +1706,15 @@ static inline unsigned long __raw_local_ #define DISABLE_INTERRUPTS(clobbers) \ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \ - PV_SAVE_REGS; \ + PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_disable); \ - PV_RESTORE_REGS;) \ + PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) #define ENABLE_INTERRUPTS(clobbers) \ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), clobbers, \ - PV_SAVE_REGS; \ + PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable); \ - PV_RESTORE_REGS;) + PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) #define USERGS_SYSRET32 \ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret32), \ @@ -1605,11 +1744,15 @@ static inline unsigned long __raw_local_ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \ swapgs) +/* + * Note: swapgs is very special, and in practise is either going to be + * implemented with a single "swapgs" instruction or something very + * special. Either way, we don't need to save any registers for + * it. + */ #define SWAPGS \ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \ - PV_SAVE_REGS; \ - call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs); \ - PV_RESTORE_REGS \ + call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs) \ ) #define GET_CR2_INTO_RCX \ Index: linux-2.6-tip/arch/x86/include/asm/pat.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pat.h +++ linux-2.6-tip/arch/x86/include/asm/pat.h @@ -2,13 +2,12 @@ #define _ASM_X86_PAT_H #include +#include #ifdef CONFIG_X86_PAT extern int pat_enabled; -extern void validate_pat_support(struct cpuinfo_x86 *c); #else static const int pat_enabled; -static inline void validate_pat_support(struct cpuinfo_x86 *c) { } #endif extern void pat_init(void); @@ -17,6 +16,11 @@ extern int reserve_memtype(u64 start, u6 unsigned long req_type, unsigned long *ret_type); extern int free_memtype(u64 start, u64 end); -extern void pat_disable(char *reason); +extern int kernel_map_sync_memtype(u64 base, unsigned long size, + unsigned long flag); +extern void map_devmem(unsigned long pfn, unsigned long size, + struct pgprot vma_prot); +extern void unmap_devmem(unsigned long pfn, unsigned long size, + struct pgprot vma_prot); #endif /* _ASM_X86_PAT_H */ Index: linux-2.6-tip/arch/x86/include/asm/pci-functions.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pci-functions.h @@ -0,0 +1,19 @@ +/* + * PCI BIOS function numbering for conventional PCI BIOS + * systems + */ + +#define PCIBIOS_PCI_FUNCTION_ID 0xb1XX +#define PCIBIOS_PCI_BIOS_PRESENT 0xb101 +#define PCIBIOS_FIND_PCI_DEVICE 0xb102 +#define PCIBIOS_FIND_PCI_CLASS_CODE 0xb103 +#define PCIBIOS_GENERATE_SPECIAL_CYCLE 0xb106 +#define PCIBIOS_READ_CONFIG_BYTE 0xb108 +#define PCIBIOS_READ_CONFIG_WORD 0xb109 +#define PCIBIOS_READ_CONFIG_DWORD 0xb10a +#define PCIBIOS_WRITE_CONFIG_BYTE 0xb10b +#define PCIBIOS_WRITE_CONFIG_WORD 0xb10c +#define PCIBIOS_WRITE_CONFIG_DWORD 0xb10d +#define PCIBIOS_GET_ROUTING_OPTIONS 0xb10e +#define PCIBIOS_SET_PCI_HW_INT 0xb10f + Index: linux-2.6-tip/arch/x86/include/asm/pci.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pci.h +++ linux-2.6-tip/arch/x86/include/asm/pci.h @@ -109,11 +109,6 @@ static inline int __pcibus_to_node(const return sd->node; } -static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus) -{ - return node_to_cpumask(__pcibus_to_node(bus)); -} - static inline const struct cpumask * cpumask_of_pcibus(const struct pci_bus *bus) { Index: linux-2.6-tip/arch/x86/include/asm/pda.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pda.h +++ /dev/null @@ -1,137 +0,0 @@ -#ifndef _ASM_X86_PDA_H -#define _ASM_X86_PDA_H - -#ifndef __ASSEMBLY__ -#include -#include -#include -#include - -/* Per processor datastructure. %gs points to it while the kernel runs */ -struct x8664_pda { - struct task_struct *pcurrent; /* 0 Current process */ - unsigned long data_offset; /* 8 Per cpu data offset from linker - address */ - unsigned long kernelstack; /* 16 top of kernel stack for current */ - unsigned long oldrsp; /* 24 user rsp for system call */ - int irqcount; /* 32 Irq nesting counter. Starts -1 */ - unsigned int cpunumber; /* 36 Logical CPU number */ -#ifdef CONFIG_CC_STACKPROTECTOR - unsigned long stack_canary; /* 40 stack canary value */ - /* gcc-ABI: this canary MUST be at - offset 40!!! */ -#endif - char *irqstackptr; - short nodenumber; /* number of current node (32k max) */ - short in_bootmem; /* pda lives in bootmem */ - unsigned int __softirq_pending; - unsigned int __nmi_count; /* number of NMI on this CPUs */ - short mmu_state; - short isidle; - struct mm_struct *active_mm; - unsigned apic_timer_irqs; - unsigned irq0_irqs; - unsigned irq_resched_count; - unsigned irq_call_count; - unsigned irq_tlb_count; - unsigned irq_thermal_count; - unsigned irq_threshold_count; - unsigned irq_spurious_count; -} ____cacheline_aligned_in_smp; - -extern struct x8664_pda **_cpu_pda; -extern void pda_init(int); - -#define cpu_pda(i) (_cpu_pda[i]) - -/* - * There is no fast way to get the base address of the PDA, all the accesses - * have to mention %fs/%gs. So it needs to be done this Torvaldian way. - */ -extern void __bad_pda_field(void) __attribute__((noreturn)); - -/* - * proxy_pda doesn't actually exist, but tell gcc it is accessed for - * all PDA accesses so it gets read/write dependencies right. - */ -extern struct x8664_pda _proxy_pda; - -#define pda_offset(field) offsetof(struct x8664_pda, field) - -#define pda_to_op(op, field, val) \ -do { \ - typedef typeof(_proxy_pda.field) T__; \ - if (0) { T__ tmp__; tmp__ = (val); } /* type checking */ \ - switch (sizeof(_proxy_pda.field)) { \ - case 2: \ - asm(op "w %1,%%gs:%c2" : \ - "+m" (_proxy_pda.field) : \ - "ri" ((T__)val), \ - "i"(pda_offset(field))); \ - break; \ - case 4: \ - asm(op "l %1,%%gs:%c2" : \ - "+m" (_proxy_pda.field) : \ - "ri" ((T__)val), \ - "i" (pda_offset(field))); \ - break; \ - case 8: \ - asm(op "q %1,%%gs:%c2": \ - "+m" (_proxy_pda.field) : \ - "ri" ((T__)val), \ - "i"(pda_offset(field))); \ - break; \ - default: \ - __bad_pda_field(); \ - } \ -} while (0) - -#define pda_from_op(op, field) \ -({ \ - typeof(_proxy_pda.field) ret__; \ - switch (sizeof(_proxy_pda.field)) { \ - case 2: \ - asm(op "w %%gs:%c1,%0" : \ - "=r" (ret__) : \ - "i" (pda_offset(field)), \ - "m" (_proxy_pda.field)); \ - break; \ - case 4: \ - asm(op "l %%gs:%c1,%0": \ - "=r" (ret__): \ - "i" (pda_offset(field)), \ - "m" (_proxy_pda.field)); \ - break; \ - case 8: \ - asm(op "q %%gs:%c1,%0": \ - "=r" (ret__) : \ - "i" (pda_offset(field)), \ - "m" (_proxy_pda.field)); \ - break; \ - default: \ - __bad_pda_field(); \ - } \ - ret__; \ -}) - -#define read_pda(field) pda_from_op("mov", field) -#define write_pda(field, val) pda_to_op("mov", field, val) -#define add_pda(field, val) pda_to_op("add", field, val) -#define sub_pda(field, val) pda_to_op("sub", field, val) -#define or_pda(field, val) pda_to_op("or", field, val) - -/* This is not atomic against other CPUs -- CPU preemption needs to be off */ -#define test_and_clear_bit_pda(bit, field) \ -({ \ - int old__; \ - asm volatile("btr %2,%%gs:%c3\n\tsbbl %0,%0" \ - : "=r" (old__), "+m" (_proxy_pda.field) \ - : "dIr" (bit), "i" (pda_offset(field)) : "memory");\ - old__; \ -}) - -#endif - -#define PDA_STACKOFFSET (5*8) - -#endif /* _ASM_X86_PDA_H */ Index: linux-2.6-tip/arch/x86/include/asm/percpu.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/percpu.h +++ linux-2.6-tip/arch/x86/include/asm/percpu.h @@ -2,53 +2,12 @@ #define _ASM_X86_PERCPU_H #ifdef CONFIG_X86_64 -#include - -/* Same as asm-generic/percpu.h, except that we store the per cpu offset - in the PDA. Longer term the PDA and every per cpu variable - should be just put into a single section and referenced directly - from %gs */ - -#ifdef CONFIG_SMP -#include - -#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset) -#define __my_cpu_offset read_pda(data_offset) - -#define per_cpu_offset(x) (__per_cpu_offset(x)) - +#define __percpu_seg gs +#define __percpu_mov_op movq +#else +#define __percpu_seg fs +#define __percpu_mov_op movl #endif -#include - -DECLARE_PER_CPU(struct x8664_pda, pda); - -/* - * These are supposed to be implemented as a single instruction which - * operates on the per-cpu data base segment. x86-64 doesn't have - * that yet, so this is a fairly inefficient workaround for the - * meantime. The single instruction is atomic with respect to - * preemption and interrupts, so we need to explicitly disable - * interrupts here to achieve the same effect. However, because it - * can be used from within interrupt-disable/enable, we can't actually - * disable interrupts; disabling preemption is enough. - */ -#define x86_read_percpu(var) \ - ({ \ - typeof(per_cpu_var(var)) __tmp; \ - preempt_disable(); \ - __tmp = __get_cpu_var(var); \ - preempt_enable(); \ - __tmp; \ - }) - -#define x86_write_percpu(var, val) \ - do { \ - preempt_disable(); \ - __get_cpu_var(var) = (val); \ - preempt_enable(); \ - } while(0) - -#else /* CONFIG_X86_64 */ #ifdef __ASSEMBLY__ @@ -65,47 +24,48 @@ DECLARE_PER_CPU(struct x8664_pda, pda); * PER_CPU(cpu_gdt_descr, %ebx) */ #ifdef CONFIG_SMP -#define PER_CPU(var, reg) \ - movl %fs:per_cpu__##this_cpu_off, reg; \ +#define PER_CPU(var, reg) \ + __percpu_mov_op %__percpu_seg:per_cpu__this_cpu_off, reg; \ lea per_cpu__##var(reg), reg -#define PER_CPU_VAR(var) %fs:per_cpu__##var +#define PER_CPU_VAR(var) %__percpu_seg:per_cpu__##var #else /* ! SMP */ -#define PER_CPU(var, reg) \ - movl $per_cpu__##var, reg +#define PER_CPU(var, reg) \ + __percpu_mov_op $per_cpu__##var, reg #define PER_CPU_VAR(var) per_cpu__##var #endif /* SMP */ +#ifdef CONFIG_X86_64_SMP +#define INIT_PER_CPU_VAR(var) init_per_cpu__##var +#else +#define INIT_PER_CPU_VAR(var) per_cpu__##var +#endif + #else /* ...!ASSEMBLY */ +#include + +#ifdef CONFIG_SMP +#define __percpu_arg(x) "%%"__stringify(__percpu_seg)":%P" #x +#define __my_cpu_offset percpu_read(this_cpu_off) +#else +#define __percpu_arg(x) "%" #x +#endif + /* - * PER_CPU finds an address of a per-cpu variable. + * Initialized pointers to per-cpu variables needed for the boot + * processor need to use these macros to get the proper address + * offset from __per_cpu_load on SMP. * - * Args: - * var - variable name - * cpu - 32bit register containing the current CPU number - * - * The resulting address is stored in the "cpu" argument. - * - * Example: - * PER_CPU(cpu_gdt_descr, %ebx) + * There also must be an entry in vmlinux_64.lds.S */ -#ifdef CONFIG_SMP - -#define __my_cpu_offset x86_read_percpu(this_cpu_off) - -/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */ -#define __percpu_seg "%%fs:" - -#else /* !SMP */ - -#define __percpu_seg "" - -#endif /* SMP */ - -#include +#define DECLARE_INIT_PER_CPU(var) \ + extern typeof(per_cpu_var(var)) init_per_cpu_var(var) -/* We can use this directly for local CPU (faster). */ -DECLARE_PER_CPU(unsigned long, this_cpu_off); +#ifdef CONFIG_X86_64_SMP +#define init_per_cpu_var(var) init_per_cpu__##var +#else +#define init_per_cpu_var(var) per_cpu_var(var) +#endif /* For arch-specific code, we can use direct single-insn ops (they * don't give an lvalue though). */ @@ -120,20 +80,25 @@ do { \ } \ switch (sizeof(var)) { \ case 1: \ - asm(op "b %1,"__percpu_seg"%0" \ + asm(op "b %1,"__percpu_arg(0) \ : "+m" (var) \ : "ri" ((T__)val)); \ break; \ case 2: \ - asm(op "w %1,"__percpu_seg"%0" \ + asm(op "w %1,"__percpu_arg(0) \ : "+m" (var) \ : "ri" ((T__)val)); \ break; \ case 4: \ - asm(op "l %1,"__percpu_seg"%0" \ + asm(op "l %1,"__percpu_arg(0) \ : "+m" (var) \ : "ri" ((T__)val)); \ break; \ + case 8: \ + asm(op "q %1,"__percpu_arg(0) \ + : "+m" (var) \ + : "re" ((T__)val)); \ + break; \ default: __bad_percpu_size(); \ } \ } while (0) @@ -143,17 +108,22 @@ do { \ typeof(var) ret__; \ switch (sizeof(var)) { \ case 1: \ - asm(op "b "__percpu_seg"%1,%0" \ + asm(op "b "__percpu_arg(1)",%0" \ : "=r" (ret__) \ : "m" (var)); \ break; \ case 2: \ - asm(op "w "__percpu_seg"%1,%0" \ + asm(op "w "__percpu_arg(1)",%0" \ : "=r" (ret__) \ : "m" (var)); \ break; \ case 4: \ - asm(op "l "__percpu_seg"%1,%0" \ + asm(op "l "__percpu_arg(1)",%0" \ + : "=r" (ret__) \ + : "m" (var)); \ + break; \ + case 8: \ + asm(op "q "__percpu_arg(1)",%0" \ : "=r" (ret__) \ : "m" (var)); \ break; \ @@ -162,13 +132,30 @@ do { \ ret__; \ }) -#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var) -#define x86_write_percpu(var, val) percpu_to_op("mov", per_cpu__##var, val) -#define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val) -#define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val) -#define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val) +#define percpu_read(var) percpu_from_op("mov", per_cpu__##var) +#define percpu_write(var, val) percpu_to_op("mov", per_cpu__##var, val) +#define percpu_add(var, val) percpu_to_op("add", per_cpu__##var, val) +#define percpu_sub(var, val) percpu_to_op("sub", per_cpu__##var, val) +#define percpu_and(var, val) percpu_to_op("and", per_cpu__##var, val) +#define percpu_or(var, val) percpu_to_op("or", per_cpu__##var, val) +#define percpu_xor(var, val) percpu_to_op("xor", per_cpu__##var, val) + +/* This is not atomic against other CPUs -- CPU preemption needs to be off */ +#define x86_test_and_clear_bit_percpu(bit, var) \ +({ \ + int old__; \ + asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ + : "=r" (old__), "+m" (per_cpu__##var) \ + : "dIr" (bit)); \ + old__; \ +}) + +#include + +/* We can use this directly for local CPU (faster). */ +DECLARE_PER_CPU(unsigned long, this_cpu_off); + #endif /* !__ASSEMBLY__ */ -#endif /* !CONFIG_X86_64 */ #ifdef CONFIG_SMP @@ -195,9 +182,9 @@ do { \ #define early_per_cpu_ptr(_name) (_name##_early_ptr) #define early_per_cpu_map(_name, _idx) (_name##_early_map[_idx]) #define early_per_cpu(_name, _cpu) \ - (early_per_cpu_ptr(_name) ? \ - early_per_cpu_ptr(_name)[_cpu] : \ - per_cpu(_name, _cpu)) + *(early_per_cpu_ptr(_name) ? \ + &early_per_cpu_ptr(_name)[_cpu] : \ + &per_cpu(_name, _cpu)) #else /* !CONFIG_SMP */ #define DEFINE_EARLY_PER_CPU(_type, _name, _initvalue) \ Index: linux-2.6-tip/arch/x86/include/asm/perf_counter.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/perf_counter.h @@ -0,0 +1,98 @@ +#ifndef _ASM_X86_PERF_COUNTER_H +#define _ASM_X86_PERF_COUNTER_H + +/* + * Performance counter hw details: + */ + +#define X86_PMC_MAX_GENERIC 8 +#define X86_PMC_MAX_FIXED 3 + +#define X86_PMC_IDX_GENERIC 0 +#define X86_PMC_IDX_FIXED 32 +#define X86_PMC_IDX_MAX 64 + +#define MSR_ARCH_PERFMON_PERFCTR0 0xc1 +#define MSR_ARCH_PERFMON_PERFCTR1 0xc2 + +#define MSR_ARCH_PERFMON_EVENTSEL0 0x186 +#define MSR_ARCH_PERFMON_EVENTSEL1 0x187 + +#define ARCH_PERFMON_EVENTSEL0_ENABLE (1 << 22) +#define ARCH_PERFMON_EVENTSEL_INT (1 << 20) +#define ARCH_PERFMON_EVENTSEL_OS (1 << 17) +#define ARCH_PERFMON_EVENTSEL_USR (1 << 16) + +/* + * Includes eventsel and unit mask as well: + */ +#define ARCH_PERFMON_EVENT_MASK 0xffff + +#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c +#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8) +#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0 +#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \ + (1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX)) + +#define ARCH_PERFMON_BRANCH_MISSES_RETIRED 6 + +/* + * Intel "Architectural Performance Monitoring" CPUID + * detection/enumeration details: + */ +union cpuid10_eax { + struct { + unsigned int version_id:8; + unsigned int num_counters:8; + unsigned int bit_width:8; + unsigned int mask_length:8; + } split; + unsigned int full; +}; + +union cpuid10_edx { + struct { + unsigned int num_counters_fixed:4; + unsigned int reserved:28; + } split; + unsigned int full; +}; + + +/* + * Fixed-purpose performance counters: + */ + +/* + * All 3 fixed-mode PMCs are configured via this single MSR: + */ +#define MSR_ARCH_PERFMON_FIXED_CTR_CTRL 0x38d + +/* + * The counts are available in three separate MSRs: + */ + +/* Instr_Retired.Any: */ +#define MSR_ARCH_PERFMON_FIXED_CTR0 0x309 +#define X86_PMC_IDX_FIXED_INSTRUCTIONS (X86_PMC_IDX_FIXED + 0) + +/* CPU_CLK_Unhalted.Core: */ +#define MSR_ARCH_PERFMON_FIXED_CTR1 0x30a +#define X86_PMC_IDX_FIXED_CPU_CYCLES (X86_PMC_IDX_FIXED + 1) + +/* CPU_CLK_Unhalted.Ref: */ +#define MSR_ARCH_PERFMON_FIXED_CTR2 0x30b +#define X86_PMC_IDX_FIXED_BUS_CYCLES (X86_PMC_IDX_FIXED + 2) + +#define set_perf_counter_pending() \ + set_tsk_thread_flag(current, TIF_PERF_COUNTERS); + +#ifdef CONFIG_PERF_COUNTERS +extern void init_hw_perf_counters(void); +extern void perf_counters_lapic_init(int nmi); +#else +static inline void init_hw_perf_counters(void) { } +static inline void perf_counters_lapic_init(int nmi) { } +#endif + +#endif /* _ASM_X86_PERF_COUNTER_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable-2level-defs.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable-2level-defs.h +++ /dev/null @@ -1,20 +0,0 @@ -#ifndef _ASM_X86_PGTABLE_2LEVEL_DEFS_H -#define _ASM_X86_PGTABLE_2LEVEL_DEFS_H - -#define SHARED_KERNEL_PMD 0 - -/* - * traditional i386 two-level paging structure: - */ - -#define PGDIR_SHIFT 22 -#define PTRS_PER_PGD 1024 - -/* - * the i386 is two-level, so we don't really have any - * PMD directory physically. - */ - -#define PTRS_PER_PTE 1024 - -#endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6-tip/arch/x86/include/asm/pgtable-2level.h @@ -26,13 +26,6 @@ static inline void native_set_pte_atomic native_set_pte(ptep, pte); } -static inline void native_set_pte_present(struct mm_struct *mm, - unsigned long addr, - pte_t *ptep, pte_t pte) -{ - native_set_pte(ptep, pte); -} - static inline void native_pmd_clear(pmd_t *pmdp) { native_set_pmd(pmdp, __pmd(0)); @@ -53,8 +46,6 @@ static inline pte_t native_ptep_get_and_ #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp) #endif -#define pte_none(x) (!(x).pte_low) - /* * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken, * split up the 29 bits of offset into this range: Index: linux-2.6-tip/arch/x86/include/asm/pgtable-2level_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pgtable-2level_types.h @@ -0,0 +1,37 @@ +#ifndef _ASM_X86_PGTABLE_2LEVEL_DEFS_H +#define _ASM_X86_PGTABLE_2LEVEL_DEFS_H + +#ifndef __ASSEMBLY__ +#include + +typedef unsigned long pteval_t; +typedef unsigned long pmdval_t; +typedef unsigned long pudval_t; +typedef unsigned long pgdval_t; +typedef unsigned long pgprotval_t; + +typedef union { + pteval_t pte; + pteval_t pte_low; +} pte_t; +#endif /* !__ASSEMBLY__ */ + +#define SHARED_KERNEL_PMD 0 +#define PAGETABLE_LEVELS 2 + +/* + * traditional i386 two-level paging structure: + */ + +#define PGDIR_SHIFT 22 +#define PTRS_PER_PGD 1024 + + +/* + * the i386 is two-level, so we don't really have any + * PMD directory physically. + */ + +#define PTRS_PER_PTE 1024 + +#endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable-3level-defs.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable-3level-defs.h +++ /dev/null @@ -1,28 +0,0 @@ -#ifndef _ASM_X86_PGTABLE_3LEVEL_DEFS_H -#define _ASM_X86_PGTABLE_3LEVEL_DEFS_H - -#ifdef CONFIG_PARAVIRT -#define SHARED_KERNEL_PMD (pv_info.shared_kernel_pmd) -#else -#define SHARED_KERNEL_PMD 1 -#endif - -/* - * PGDIR_SHIFT determines what a top-level page table entry can map - */ -#define PGDIR_SHIFT 30 -#define PTRS_PER_PGD 4 - -/* - * PMD_SHIFT determines the size of the area a middle-level - * page table can map - */ -#define PMD_SHIFT 21 -#define PTRS_PER_PMD 512 - -/* - * entries per page directory level - */ -#define PTRS_PER_PTE 512 - -#endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable-3level.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable-3level.h +++ linux-2.6-tip/arch/x86/include/asm/pgtable-3level.h @@ -18,21 +18,6 @@ printk("%s:%d: bad pgd %p(%016Lx).\n", \ __FILE__, __LINE__, &(e), pgd_val(e)) -static inline int pud_none(pud_t pud) -{ - return pud_val(pud) == 0; -} - -static inline int pud_bad(pud_t pud) -{ - return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0; -} - -static inline int pud_present(pud_t pud) -{ - return pud_val(pud) & _PAGE_PRESENT; -} - /* Rules for using set_pte: the pte being assigned *must* be * either not present or in a state where the hardware will * not attempt to update the pte. In places where this is @@ -46,23 +31,6 @@ static inline void native_set_pte(pte_t ptep->pte_low = pte.pte_low; } -/* - * Since this is only called on user PTEs, and the page fault handler - * must handle the already racy situation of simultaneous page faults, - * we are justified in merely clearing the PTE present bit, followed - * by a set. The ordering here is important. - */ -static inline void native_set_pte_present(struct mm_struct *mm, - unsigned long addr, - pte_t *ptep, pte_t pte) -{ - ptep->pte_low = 0; - smp_wmb(); - ptep->pte_high = pte.pte_high; - smp_wmb(); - ptep->pte_low = pte.pte_low; -} - static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte) { set_64bit((unsigned long long *)(ptep), native_pte_val(pte)); @@ -103,6 +71,7 @@ static inline void pud_clear(pud_t *pudp { unsigned long pgd; + preempt_disable(); set_pud(pudp, __pud(0)); /* @@ -118,17 +87,9 @@ static inline void pud_clear(pud_t *pudp if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + sizeof(pgd_t)*PTRS_PER_PGD)) write_cr3(pgd); + preempt_enable(); } -#define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) - -#define pud_page_vaddr(pud) ((unsigned long) __va(pud_val(pud) & PTE_PFN_MASK)) - - -/* Find an entry in the second-level page table.. */ -#define pmd_offset(pud, address) ((pmd_t *)pud_page_vaddr(*(pud)) + \ - pmd_index(address)) - #ifdef CONFIG_SMP static inline pte_t native_ptep_get_and_clear(pte_t *ptep) { @@ -145,17 +106,6 @@ static inline pte_t native_ptep_get_and_ #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp) #endif -#define __HAVE_ARCH_PTE_SAME -static inline int pte_same(pte_t a, pte_t b) -{ - return a.pte_low == b.pte_low && a.pte_high == b.pte_high; -} - -static inline int pte_none(pte_t pte) -{ - return !pte.pte_low && !pte.pte_high; -} - /* * Bits 0, 6 and 7 are taken in the low part of the pte, * put the 32 bits of offset into the high part. Index: linux-2.6-tip/arch/x86/include/asm/pgtable-3level_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pgtable-3level_types.h @@ -0,0 +1,48 @@ +#ifndef _ASM_X86_PGTABLE_3LEVEL_DEFS_H +#define _ASM_X86_PGTABLE_3LEVEL_DEFS_H + +#ifndef __ASSEMBLY__ +#include + +typedef u64 pteval_t; +typedef u64 pmdval_t; +typedef u64 pudval_t; +typedef u64 pgdval_t; +typedef u64 pgprotval_t; + +typedef union { + struct { + unsigned long pte_low, pte_high; + }; + pteval_t pte; +} pte_t; +#endif /* !__ASSEMBLY__ */ + +#ifdef CONFIG_PARAVIRT +#define SHARED_KERNEL_PMD (pv_info.shared_kernel_pmd) +#else +#define SHARED_KERNEL_PMD 1 +#endif + +#define PAGETABLE_LEVELS 3 + +/* + * PGDIR_SHIFT determines what a top-level page table entry can map + */ +#define PGDIR_SHIFT 30 +#define PTRS_PER_PGD 4 + +/* + * PMD_SHIFT determines the size of the area a middle-level + * page table can map + */ +#define PMD_SHIFT 21 +#define PTRS_PER_PMD 512 + +/* + * entries per page directory level + */ +#define PTRS_PER_PTE 512 + + +#endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable.h +++ linux-2.6-tip/arch/x86/include/asm/pgtable.h @@ -1,164 +1,9 @@ #ifndef _ASM_X86_PGTABLE_H #define _ASM_X86_PGTABLE_H -#define FIRST_USER_ADDRESS 0 +#include -#define _PAGE_BIT_PRESENT 0 /* is present */ -#define _PAGE_BIT_RW 1 /* writeable */ -#define _PAGE_BIT_USER 2 /* userspace addressable */ -#define _PAGE_BIT_PWT 3 /* page write through */ -#define _PAGE_BIT_PCD 4 /* page cache disabled */ -#define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */ -#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */ -#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ -#define _PAGE_BIT_PAT 7 /* on 4KB pages */ -#define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ -#define _PAGE_BIT_UNUSED1 9 /* available for programmer */ -#define _PAGE_BIT_IOMAP 10 /* flag used to indicate IO mapping */ -#define _PAGE_BIT_UNUSED3 11 -#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1 -#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1 -#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ - -/* If _PAGE_BIT_PRESENT is clear, we use these: */ -/* - if the user mapped it with PROT_NONE; pte_present gives true */ -#define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL -/* - set: nonlinear file mapping, saved PTE; unset:swap */ -#define _PAGE_BIT_FILE _PAGE_BIT_DIRTY - -#define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) -#define _PAGE_RW (_AT(pteval_t, 1) << _PAGE_BIT_RW) -#define _PAGE_USER (_AT(pteval_t, 1) << _PAGE_BIT_USER) -#define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT) -#define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD) -#define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED) -#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) -#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) -#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) -#define _PAGE_UNUSED1 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1) -#define _PAGE_IOMAP (_AT(pteval_t, 1) << _PAGE_BIT_IOMAP) -#define _PAGE_UNUSED3 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED3) -#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT) -#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) -#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL) -#define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST) -#define __HAVE_ARCH_PTE_SPECIAL - -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) -#define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) -#else -#define _PAGE_NX (_AT(pteval_t, 0)) -#endif - -#define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE) -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) - -#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_DIRTY) -#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ - _PAGE_DIRTY) - -/* Set of bits not changed in pte_modify */ -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY) - -#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) -#define _PAGE_CACHE_WB (0) -#define _PAGE_CACHE_WC (_PAGE_PWT) -#define _PAGE_CACHE_UC_MINUS (_PAGE_PCD) -#define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT) - -#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED) -#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_NX) - -#define PAGE_SHARED_EXEC __pgprot(_PAGE_PRESENT | _PAGE_RW | \ - _PAGE_USER | _PAGE_ACCESSED) -#define PAGE_COPY_NOEXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_NX) -#define PAGE_COPY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ - _PAGE_ACCESSED) -#define PAGE_COPY PAGE_COPY_NOEXEC -#define PAGE_READONLY __pgprot(_PAGE_PRESENT | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_NX) -#define PAGE_READONLY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ - _PAGE_ACCESSED) - -#define __PAGE_KERNEL_EXEC \ - (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL) -#define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX) - -#define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW) -#define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW) -#define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT) -#define __PAGE_KERNEL_WC (__PAGE_KERNEL | _PAGE_CACHE_WC) -#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT) -#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD) -#define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER) -#define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT) -#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE) -#define __PAGE_KERNEL_LARGE_NOCACHE (__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE) -#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE) - -#define __PAGE_KERNEL_IO (__PAGE_KERNEL | _PAGE_IOMAP) -#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP) -#define __PAGE_KERNEL_IO_UC_MINUS (__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP) -#define __PAGE_KERNEL_IO_WC (__PAGE_KERNEL_WC | _PAGE_IOMAP) - -#define PAGE_KERNEL __pgprot(__PAGE_KERNEL) -#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO) -#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) -#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX) -#define PAGE_KERNEL_WC __pgprot(__PAGE_KERNEL_WC) -#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE) -#define PAGE_KERNEL_UC_MINUS __pgprot(__PAGE_KERNEL_UC_MINUS) -#define PAGE_KERNEL_EXEC_NOCACHE __pgprot(__PAGE_KERNEL_EXEC_NOCACHE) -#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE) -#define PAGE_KERNEL_LARGE_NOCACHE __pgprot(__PAGE_KERNEL_LARGE_NOCACHE) -#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC) -#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL) -#define PAGE_KERNEL_VSYSCALL_NOCACHE __pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE) - -#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO) -#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE) -#define PAGE_KERNEL_IO_UC_MINUS __pgprot(__PAGE_KERNEL_IO_UC_MINUS) -#define PAGE_KERNEL_IO_WC __pgprot(__PAGE_KERNEL_IO_WC) - -/* xwr */ -#define __P000 PAGE_NONE -#define __P001 PAGE_READONLY -#define __P010 PAGE_COPY -#define __P011 PAGE_COPY -#define __P100 PAGE_READONLY_EXEC -#define __P101 PAGE_READONLY_EXEC -#define __P110 PAGE_COPY_EXEC -#define __P111 PAGE_COPY_EXEC - -#define __S000 PAGE_NONE -#define __S001 PAGE_READONLY -#define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED -#define __S100 PAGE_READONLY_EXEC -#define __S101 PAGE_READONLY_EXEC -#define __S110 PAGE_SHARED_EXEC -#define __S111 PAGE_SHARED_EXEC - -/* - * early identity mapping pte attrib macros. - */ -#ifdef CONFIG_X86_64 -#define __PAGE_KERNEL_IDENT_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC -#else -/* - * For PDE_IDENT_ATTR include USER bit. As the PDE and PTE protection - * bits are combined, this will alow user to access the high address mapped - * VDSO in the presence of CONFIG_COMPAT_VDSO - */ -#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */ -#define PDE_IDENT_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */ -#define PGD_IDENT_ATTR 0x001 /* PRESENT (no other attributes) */ -#endif +#include /* * Macro to mark a page protection value as UC- @@ -170,9 +15,6 @@ #ifndef __ASSEMBLY__ -#define pgprot_writecombine pgprot_writecombine -extern pgprot_t pgprot_writecombine(pgprot_t prot); - /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. @@ -183,6 +25,64 @@ extern unsigned long empty_zero_page[PAG extern spinlock_t pgd_lock; extern struct list_head pgd_list; +#ifdef CONFIG_PARAVIRT +#include +#else /* !CONFIG_PARAVIRT */ +#define set_pte(ptep, pte) native_set_pte(ptep, pte) +#define set_pte_at(mm, addr, ptep, pte) native_set_pte_at(mm, addr, ptep, pte) + +#define set_pte_atomic(ptep, pte) \ + native_set_pte_atomic(ptep, pte) + +#define set_pmd(pmdp, pmd) native_set_pmd(pmdp, pmd) + +#ifndef __PAGETABLE_PUD_FOLDED +#define set_pgd(pgdp, pgd) native_set_pgd(pgdp, pgd) +#define pgd_clear(pgd) native_pgd_clear(pgd) +#endif + +#ifndef set_pud +# define set_pud(pudp, pud) native_set_pud(pudp, pud) +#endif + +#ifndef __PAGETABLE_PMD_FOLDED +#define pud_clear(pud) native_pud_clear(pud) +#endif + +#define pte_clear(mm, addr, ptep) native_pte_clear(mm, addr, ptep) +#define pmd_clear(pmd) native_pmd_clear(pmd) + +#define pte_update(mm, addr, ptep) do { } while (0) +#define pte_update_defer(mm, addr, ptep) do { } while (0) + +static inline void __init paravirt_pagetable_setup_start(pgd_t *base) +{ + native_pagetable_setup_start(base); +} + +static inline void __init paravirt_pagetable_setup_done(pgd_t *base) +{ + native_pagetable_setup_done(base); +} + +#define pgd_val(x) native_pgd_val(x) +#define __pgd(x) native_make_pgd(x) + +#ifndef __PAGETABLE_PUD_FOLDED +#define pud_val(x) native_pud_val(x) +#define __pud(x) native_make_pud(x) +#endif + +#ifndef __PAGETABLE_PMD_FOLDED +#define pmd_val(x) native_pmd_val(x) +#define __pmd(x) native_make_pmd(x) +#endif + +#define pte_val(x) native_pte_val(x) +#define __pte(x) native_make_pte(x) + +#endif /* CONFIG_PARAVIRT */ + /* * The following only work if pte_present() is true. * Undefined behaviour if not.. @@ -236,72 +136,84 @@ static inline unsigned long pte_pfn(pte_ static inline int pmd_large(pmd_t pte) { - return (pmd_val(pte) & (_PAGE_PSE | _PAGE_PRESENT)) == + return (pmd_flags(pte) & (_PAGE_PSE | _PAGE_PRESENT)) == (_PAGE_PSE | _PAGE_PRESENT); } +static inline pte_t pte_set_flags(pte_t pte, pteval_t set) +{ + pteval_t v = native_pte_val(pte); + + return native_make_pte(v | set); +} + +static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) +{ + pteval_t v = native_pte_val(pte); + + return native_make_pte(v & ~clear); +} + static inline pte_t pte_mkclean(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY); } static inline pte_t pte_mkold(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_ACCESSED); + return pte_clear_flags(pte, _PAGE_ACCESSED); } static inline pte_t pte_wrprotect(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_RW); + return pte_clear_flags(pte, _PAGE_RW); } static inline pte_t pte_mkexec(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_NX); + return pte_clear_flags(pte, _PAGE_NX); } static inline pte_t pte_mkdirty(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_ACCESSED); + return pte_set_flags(pte, _PAGE_ACCESSED); } static inline pte_t pte_mkwrite(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_RW); + return pte_set_flags(pte, _PAGE_RW); } static inline pte_t pte_mkhuge(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_PSE); + return pte_set_flags(pte, _PAGE_PSE); } static inline pte_t pte_clrhuge(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_PSE); + return pte_clear_flags(pte, _PAGE_PSE); } static inline pte_t pte_mkglobal(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_GLOBAL); + return pte_set_flags(pte, _PAGE_GLOBAL); } static inline pte_t pte_clrglobal(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_GLOBAL); + return pte_clear_flags(pte, _PAGE_GLOBAL); } static inline pte_t pte_mkspecial(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SPECIAL); + return pte_set_flags(pte, _PAGE_SPECIAL); } -extern pteval_t __supported_pte_mask; - /* * Mask out unsupported bits in a present pgprot. Non-present pgprots * can use those bits for other purposes, so leave them be. @@ -374,82 +286,202 @@ static inline int is_new_memtype_allowed return 1; } -#ifndef __ASSEMBLY__ -/* Indicate that x86 has its own track and untrack pfn vma functions */ -#define __HAVE_PFNMAP_TRACKING - -#define __HAVE_PHYS_MEM_ACCESS_PROT -struct file; -pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, - unsigned long size, pgprot_t vma_prot); -int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, - unsigned long size, pgprot_t *vma_prot); -#endif - -/* Install a pte for a particular vaddr in kernel space. */ -void set_pte_vaddr(unsigned long vaddr, pte_t pte); +pmd_t *populate_extra_pmd(unsigned long vaddr); +pte_t *populate_extra_pte(unsigned long vaddr); +#endif /* __ASSEMBLY__ */ #ifdef CONFIG_X86_32 -extern void native_pagetable_setup_start(pgd_t *base); -extern void native_pagetable_setup_done(pgd_t *base); +# include "pgtable_32.h" #else -static inline void native_pagetable_setup_start(pgd_t *base) {} -static inline void native_pagetable_setup_done(pgd_t *base) {} +# include "pgtable_64.h" #endif -struct seq_file; -extern void arch_report_meminfo(struct seq_file *m); +#ifndef __ASSEMBLY__ +#include -#ifdef CONFIG_PARAVIRT -#include -#else /* !CONFIG_PARAVIRT */ -#define set_pte(ptep, pte) native_set_pte(ptep, pte) -#define set_pte_at(mm, addr, ptep, pte) native_set_pte_at(mm, addr, ptep, pte) +static inline int pte_none(pte_t pte) +{ + return !pte.pte; +} -#define set_pte_present(mm, addr, ptep, pte) \ - native_set_pte_present(mm, addr, ptep, pte) -#define set_pte_atomic(ptep, pte) \ - native_set_pte_atomic(ptep, pte) +#define __HAVE_ARCH_PTE_SAME +static inline int pte_same(pte_t a, pte_t b) +{ + return a.pte == b.pte; +} -#define set_pmd(pmdp, pmd) native_set_pmd(pmdp, pmd) +static inline int pte_present(pte_t a) +{ + return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE); +} -#ifndef __PAGETABLE_PUD_FOLDED -#define set_pgd(pgdp, pgd) native_set_pgd(pgdp, pgd) -#define pgd_clear(pgd) native_pgd_clear(pgd) -#endif +static inline int pmd_present(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_PRESENT; +} -#ifndef set_pud -# define set_pud(pudp, pud) native_set_pud(pudp, pud) -#endif +static inline int pmd_none(pmd_t pmd) +{ + /* Only check low word on 32-bit platforms, since it might be + out of sync with upper half. */ + return (unsigned long)native_pmd_val(pmd) == 0; +} -#ifndef __PAGETABLE_PMD_FOLDED -#define pud_clear(pud) native_pud_clear(pud) -#endif +static inline unsigned long pmd_page_vaddr(pmd_t pmd) +{ + return (unsigned long)__va(pmd_val(pmd) & PTE_PFN_MASK); +} -#define pte_clear(mm, addr, ptep) native_pte_clear(mm, addr, ptep) -#define pmd_clear(pmd) native_pmd_clear(pmd) +/* + * Currently stuck as a macro due to indirect forward reference to + * linux/mmzone.h's __section_mem_map_addr() definition: + */ +#define pmd_page(pmd) pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT) -#define pte_update(mm, addr, ptep) do { } while (0) -#define pte_update_defer(mm, addr, ptep) do { } while (0) +/* + * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD] + * + * this macro returns the index of the entry in the pmd page which would + * control the given virtual address + */ +static inline unsigned pmd_index(unsigned long address) +{ + return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); +} -static inline void __init paravirt_pagetable_setup_start(pgd_t *base) +/* + * Conversion functions: convert a page and protection to a page entry, + * and a page entry and page directory to the page they refer to. + * + * (Currently stuck as a macro because of indirect forward reference + * to linux/mm.h:page_to_nid()) + */ +#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) + +/* + * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE] + * + * this function returns the index of the entry in the pte page which would + * control the given virtual address + */ +static inline unsigned pte_index(unsigned long address) { - native_pagetable_setup_start(base); + return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); } -static inline void __init paravirt_pagetable_setup_done(pgd_t *base) +static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) { - native_pagetable_setup_done(base); + return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address); } -#endif /* CONFIG_PARAVIRT */ -#endif /* __ASSEMBLY__ */ +static inline int pmd_bad(pmd_t pmd) +{ + return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE; +} -#ifdef CONFIG_X86_32 -# include "pgtable_32.h" +static inline unsigned long pages_to_mb(unsigned long npg) +{ + return npg >> (20 - PAGE_SHIFT); +} + +#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \ + remap_pfn_range(vma, vaddr, pfn, size, prot) + +#if PAGETABLE_LEVELS > 2 +static inline int pud_none(pud_t pud) +{ + return native_pud_val(pud) == 0; +} + +static inline int pud_present(pud_t pud) +{ + return pud_flags(pud) & _PAGE_PRESENT; +} + +static inline unsigned long pud_page_vaddr(pud_t pud) +{ + return (unsigned long)__va((unsigned long)pud_val(pud) & PTE_PFN_MASK); +} + +/* + * Currently stuck as a macro due to indirect forward reference to + * linux/mmzone.h's __section_mem_map_addr() definition: + */ +#define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) + +/* Find an entry in the second-level page table.. */ +static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) +{ + return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); +} + +static inline unsigned long pmd_pfn(pmd_t pmd) +{ + return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT; +} + +static inline int pud_large(pud_t pud) +{ + return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) == + (_PAGE_PSE | _PAGE_PRESENT); +} + +static inline int pud_bad(pud_t pud) +{ + return (pud_flags(pud) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0; +} #else -# include "pgtable_64.h" -#endif +static inline int pud_large(pud_t pud) +{ + return 0; +} +#endif /* PAGETABLE_LEVELS > 2 */ + +#if PAGETABLE_LEVELS > 3 +static inline int pgd_present(pgd_t pgd) +{ + return pgd_flags(pgd) & _PAGE_PRESENT; +} + +static inline unsigned long pgd_page_vaddr(pgd_t pgd) +{ + return (unsigned long)__va((unsigned long)pgd_val(pgd) & PTE_PFN_MASK); +} + +/* + * Currently stuck as a macro due to indirect forward reference to + * linux/mmzone.h's __section_mem_map_addr() definition: + */ +#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT) + +/* to find an entry in a page-table-directory. */ +static inline unsigned pud_index(unsigned long address) +{ + return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1); +} + +static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address) +{ + return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address); +} + +static inline int pgd_bad(pgd_t pgd) +{ + return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE; +} + +static inline int pgd_none(pgd_t pgd) +{ + return !native_pgd_val(pgd); +} +#endif /* PAGETABLE_LEVELS > 3 */ + +static inline int pte_hidden(pte_t pte) +{ + return pte_flags(pte) & _PAGE_HIDDEN; +} + +#endif /* __ASSEMBLY__ */ /* * the pgd page can be thought of an array like this: pgd_t[PTRS_PER_PGD] @@ -476,28 +508,6 @@ static inline void __init paravirt_paget #ifndef __ASSEMBLY__ -enum { - PG_LEVEL_NONE, - PG_LEVEL_4K, - PG_LEVEL_2M, - PG_LEVEL_1G, - PG_LEVEL_NUM -}; - -#ifdef CONFIG_PROC_FS -extern void update_page_count(int level, unsigned long pages); -#else -static inline void update_page_count(int level, unsigned long pages) { } -#endif - -/* - * Helper function that returns the kernel pagetable entry controlling - * the virtual address 'address'. NULL means no pagetable entry present. - * NOTE: the return type is pte_t but if the pmd is PSE then we return it - * as a pte too. - */ -extern pte_t *lookup_address(unsigned long address, unsigned int *level); - /* local pte updates need not use xchg for locking */ static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep) { Index: linux-2.6-tip/arch/x86/include/asm/pgtable_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable_32.h +++ linux-2.6-tip/arch/x86/include/asm/pgtable_32.h @@ -1,6 +1,7 @@ #ifndef _ASM_X86_PGTABLE_32_H #define _ASM_X86_PGTABLE_32_H +#include /* * The Linux memory management assumes a three-level page table setup. On @@ -33,47 +34,6 @@ void paging_init(void); extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t); -/* - * The Linux x86 paging architecture is 'compile-time dual-mode', it - * implements both the traditional 2-level x86 page tables and the - * newer 3-level PAE-mode page tables. - */ -#ifdef CONFIG_X86_PAE -# include -# define PMD_SIZE (1UL << PMD_SHIFT) -# define PMD_MASK (~(PMD_SIZE - 1)) -#else -# include -#endif - -#define PGDIR_SIZE (1UL << PGDIR_SHIFT) -#define PGDIR_MASK (~(PGDIR_SIZE - 1)) - -/* Just any arbitrary offset to the start of the vmalloc VM area: the - * current 8MB value just means that there will be a 8MB "hole" after the - * physical memory until the kernel virtual memory starts. That means that - * any out-of-bounds memory accesses will hopefully be caught. - * The vmalloc() routines leaves a hole of 4kB between each vmalloced - * area for the same reason. ;) - */ -#define VMALLOC_OFFSET (8 * 1024 * 1024) -#define VMALLOC_START ((unsigned long)high_memory + VMALLOC_OFFSET) -#ifdef CONFIG_X86_PAE -#define LAST_PKMAP 512 -#else -#define LAST_PKMAP 1024 -#endif - -#define PKMAP_BASE ((FIXADDR_BOOT_START - PAGE_SIZE * (LAST_PKMAP + 1)) \ - & PMD_MASK) - -#ifdef CONFIG_HIGHMEM -# define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE) -#else -# define VMALLOC_END (FIXADDR_START - 2 * PAGE_SIZE) -#endif - -#define MAXMEM (VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE) /* * Define this if things work differently on an i386 and an i486: @@ -82,58 +42,12 @@ extern void set_pmd_pfn(unsigned long, u */ #undef TEST_ACCESS_OK -/* The boot page tables (all created as a single array) */ -extern unsigned long pg0[]; - -#define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE)) - -/* To avoid harmful races, pmd_none(x) should check only the lower when PAE */ -#define pmd_none(x) (!(unsigned long)pmd_val((x))) -#define pmd_present(x) (pmd_val((x)) & _PAGE_PRESENT) -#define pmd_bad(x) ((pmd_val(x) & (PTE_FLAGS_MASK & ~_PAGE_USER)) != _KERNPG_TABLE) - -#define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) - #ifdef CONFIG_X86_PAE # include #else # include #endif -/* - * Conversion functions: convert a page and protection to a page entry, - * and a page entry and page directory to the page they refer to. - */ -#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) - - -static inline int pud_large(pud_t pud) { return 0; } - -/* - * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD] - * - * this macro returns the index of the entry in the pmd page which would - * control the given virtual address - */ -#define pmd_index(address) \ - (((address) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)) - -/* - * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE] - * - * this macro returns the index of the entry in the pte page which would - * control the given virtual address - */ -#define pte_index(address) \ - (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) -#define pte_offset_kernel(dir, address) \ - ((pte_t *)pmd_page_vaddr(*(dir)) + pte_index((address))) - -#define pmd_page(pmd) (pfn_to_page(pmd_val((pmd)) >> PAGE_SHIFT)) - -#define pmd_page_vaddr(pmd) \ - ((unsigned long)__va(pmd_val((pmd)) & PTE_PFN_MASK)) - #if defined(CONFIG_HIGHPTE) #define pte_offset_map(dir, address) \ ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE0) + \ @@ -141,14 +55,20 @@ static inline int pud_large(pud_t pud) { #define pte_offset_map_nested(dir, address) \ ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE1) + \ pte_index((address))) +#define pte_offset_map_direct(dir, address) \ + ((pte_t *)kmap_atomic_pte_direct(pmd_page(*(dir)), KM_PTE0) + \ + pte_index((address))) #define pte_unmap(pte) kunmap_atomic((pte), KM_PTE0) #define pte_unmap_nested(pte) kunmap_atomic((pte), KM_PTE1) +#define pte_unmap_direct(pte) kunmap_atomic_direct((pte), KM_PTE0) #else #define pte_offset_map(dir, address) \ ((pte_t *)page_address(pmd_page(*(dir))) + pte_index((address))) #define pte_offset_map_nested(dir, address) pte_offset_map((dir), (address)) +#define pte_offset_map_direct(dir, address) pte_offset_map((dir), (address)) #define pte_unmap(pte) do { } while (0) #define pte_unmap_nested(pte) do { } while (0) +#define pte_unmap_direct(pte) do { } while (0) #endif /* Clear a kernel PTE and flush it from the TLB */ @@ -176,7 +96,4 @@ do { \ #define kern_addr_valid(kaddr) (0) #endif -#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \ - remap_pfn_range(vma, vaddr, pfn, size, prot) - #endif /* _ASM_X86_PGTABLE_32_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable_32_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pgtable_32_types.h @@ -0,0 +1,51 @@ +#ifndef _ASM_X86_PGTABLE_32_DEFS_H +#define _ASM_X86_PGTABLE_32_DEFS_H + +/* + * The Linux x86 paging architecture is 'compile-time dual-mode', it + * implements both the traditional 2-level x86 page tables and the + * newer 3-level PAE-mode page tables. + */ +#ifdef CONFIG_X86_PAE +# include +# define PMD_SIZE (1UL << PMD_SHIFT) +# define PMD_MASK (~(PMD_SIZE - 1)) +#else +# include +#endif + +#define PGDIR_SIZE (1UL << PGDIR_SHIFT) +#define PGDIR_MASK (~(PGDIR_SIZE - 1)) + +/* Just any arbitrary offset to the start of the vmalloc VM area: the + * current 8MB value just means that there will be a 8MB "hole" after the + * physical memory until the kernel virtual memory starts. That means that + * any out-of-bounds memory accesses will hopefully be caught. + * The vmalloc() routines leaves a hole of 4kB between each vmalloced + * area for the same reason. ;) + */ +#define VMALLOC_OFFSET (8 * 1024 * 1024) + +#ifndef __ASSEMBLER__ +extern bool __vmalloc_start_set; /* set once high_memory is set */ +#endif + +#define VMALLOC_START ((unsigned long)high_memory + VMALLOC_OFFSET) +#ifdef CONFIG_X86_PAE +#define LAST_PKMAP 512 +#else +#define LAST_PKMAP 1024 +#endif + +#define PKMAP_BASE ((FIXADDR_BOOT_START - PAGE_SIZE * (LAST_PKMAP + 1)) \ + & PMD_MASK) + +#ifdef CONFIG_HIGHMEM +# define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE) +#else +# define VMALLOC_END (FIXADDR_START - 2 * PAGE_SIZE) +#endif + +#define MAXMEM (VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE) + +#endif /* _ASM_X86_PGTABLE_32_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/pgtable_64.h +++ linux-2.6-tip/arch/x86/include/asm/pgtable_64.h @@ -2,6 +2,8 @@ #define _ASM_X86_PGTABLE_64_H #include +#include + #ifndef __ASSEMBLY__ /* @@ -11,7 +13,6 @@ #include #include #include -#include extern pud_t level3_kernel_pgt[512]; extern pud_t level3_ident_pgt[512]; @@ -26,32 +27,6 @@ extern void paging_init(void); #endif /* !__ASSEMBLY__ */ -#define SHARED_KERNEL_PMD 0 - -/* - * PGDIR_SHIFT determines what a top-level page table entry can map - */ -#define PGDIR_SHIFT 39 -#define PTRS_PER_PGD 512 - -/* - * 3rd level page - */ -#define PUD_SHIFT 30 -#define PTRS_PER_PUD 512 - -/* - * PMD_SHIFT determines the size of the area a middle-level - * page table can map - */ -#define PMD_SHIFT 21 -#define PTRS_PER_PMD 512 - -/* - * entries per page directory level - */ -#define PTRS_PER_PTE 512 - #ifndef __ASSEMBLY__ #define pte_ERROR(e) \ @@ -67,9 +42,6 @@ extern void paging_init(void); printk("%s:%d: bad pgd %p(%016lx).\n", \ __FILE__, __LINE__, &(e), pgd_val(e)) -#define pgd_none(x) (!pgd_val(x)) -#define pud_none(x) (!pud_val(x)) - struct mm_struct; void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte); @@ -134,48 +106,6 @@ static inline void native_pgd_clear(pgd_ native_set_pgd(pgd, native_make_pgd(0)); } -#define pte_same(a, b) ((a).pte == (b).pte) - -#endif /* !__ASSEMBLY__ */ - -#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT) -#define PMD_MASK (~(PMD_SIZE - 1)) -#define PUD_SIZE (_AC(1, UL) << PUD_SHIFT) -#define PUD_MASK (~(PUD_SIZE - 1)) -#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT) -#define PGDIR_MASK (~(PGDIR_SIZE - 1)) - - -#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) -#define VMALLOC_START _AC(0xffffc20000000000, UL) -#define VMALLOC_END _AC(0xffffe1ffffffffff, UL) -#define VMEMMAP_START _AC(0xffffe20000000000, UL) -#define MODULES_VADDR _AC(0xffffffffa0000000, UL) -#define MODULES_END _AC(0xffffffffff000000, UL) -#define MODULES_LEN (MODULES_END - MODULES_VADDR) - -#ifndef __ASSEMBLY__ - -static inline int pgd_bad(pgd_t pgd) -{ - return (pgd_val(pgd) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE; -} - -static inline int pud_bad(pud_t pud) -{ - return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE; -} - -static inline int pmd_bad(pmd_t pmd) -{ - return (pmd_val(pmd) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE; -} - -#define pte_none(x) (!pte_val((x))) -#define pte_present(x) (pte_val((x)) & (_PAGE_PRESENT | _PAGE_PROTNONE)) - -#define pages_to_mb(x) ((x) >> (20 - PAGE_SHIFT)) /* FIXME: is this right? */ - /* * Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. @@ -184,41 +114,12 @@ static inline int pmd_bad(pmd_t pmd) /* * Level 4 access. */ -#define pgd_page_vaddr(pgd) \ - ((unsigned long)__va((unsigned long)pgd_val((pgd)) & PTE_PFN_MASK)) -#define pgd_page(pgd) (pfn_to_page(pgd_val((pgd)) >> PAGE_SHIFT)) -#define pgd_present(pgd) (pgd_val(pgd) & _PAGE_PRESENT) static inline int pgd_large(pgd_t pgd) { return 0; } #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE) /* PUD - Level3 access */ -/* to find an entry in a page-table-directory. */ -#define pud_page_vaddr(pud) \ - ((unsigned long)__va(pud_val((pud)) & PHYSICAL_PAGE_MASK)) -#define pud_page(pud) (pfn_to_page(pud_val((pud)) >> PAGE_SHIFT)) -#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)) -#define pud_offset(pgd, address) \ - ((pud_t *)pgd_page_vaddr(*(pgd)) + pud_index((address))) -#define pud_present(pud) (pud_val((pud)) & _PAGE_PRESENT) - -static inline int pud_large(pud_t pte) -{ - return (pud_val(pte) & (_PAGE_PSE | _PAGE_PRESENT)) == - (_PAGE_PSE | _PAGE_PRESENT); -} /* PMD - Level 2 access */ -#define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val((pmd)) & PTE_PFN_MASK)) -#define pmd_page(pmd) (pfn_to_page(pmd_val((pmd)) >> PAGE_SHIFT)) - -#define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)) -#define pmd_offset(dir, address) ((pmd_t *)pud_page_vaddr(*(dir)) + \ - pmd_index(address)) -#define pmd_none(x) (!pmd_val((x))) -#define pmd_present(x) (pmd_val((x)) & _PAGE_PRESENT) -#define pfn_pmd(nr, prot) (__pmd(((nr) << PAGE_SHIFT) | pgprot_val((prot)))) -#define pmd_pfn(x) ((pmd_val((x)) & __PHYSICAL_MASK) >> PAGE_SHIFT) - #define pte_to_pgoff(pte) ((pte_val((pte)) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT) #define pgoff_to_pte(off) ((pte_t) { .pte = ((off) << PAGE_SHIFT) | \ _PAGE_FILE }) @@ -226,18 +127,13 @@ static inline int pud_large(pud_t pte) /* PTE - Level 1 access. */ -/* page, protection -> pte */ -#define mk_pte(page, pgprot) pfn_pte(page_to_pfn((page)), (pgprot)) - -#define pte_index(address) (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) -#define pte_offset_kernel(dir, address) ((pte_t *) pmd_page_vaddr(*(dir)) + \ - pte_index((address))) - /* x86-64 always has all page tables mapped. */ #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_offset_map_nested(dir, address) pte_offset_kernel((dir), (address)) -#define pte_unmap(pte) /* NOP */ -#define pte_unmap_nested(pte) /* NOP */ +#define pte_offset_map_direct(dir, address) pte_offset_kernel((dir), (address)) +#define pte_unmap(pte) do { } while (0) +#define pte_unmap_nested(pte) do { } while (0) +#define pte_unmap_direct(pte) do { } while (0) #define update_mmu_cache(vma, address, pte) do { } while (0) @@ -266,9 +162,6 @@ extern int direct_gbpages; extern int kern_addr_valid(unsigned long addr); extern void cleanup_highmap(void); -#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \ - remap_pfn_range(vma, vaddr, pfn, size, prot) - #define HAVE_ARCH_UNMAPPED_AREA #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN Index: linux-2.6-tip/arch/x86/include/asm/pgtable_64_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pgtable_64_types.h @@ -0,0 +1,63 @@ +#ifndef _ASM_X86_PGTABLE_64_DEFS_H +#define _ASM_X86_PGTABLE_64_DEFS_H + +#ifndef __ASSEMBLY__ +#include + +/* + * These are used to make use of C type-checking.. + */ +typedef unsigned long pteval_t; +typedef unsigned long pmdval_t; +typedef unsigned long pudval_t; +typedef unsigned long pgdval_t; +typedef unsigned long pgprotval_t; + +typedef struct { pteval_t pte; } pte_t; + +#endif /* !__ASSEMBLY__ */ + +#define SHARED_KERNEL_PMD 0 +#define PAGETABLE_LEVELS 4 + +/* + * PGDIR_SHIFT determines what a top-level page table entry can map + */ +#define PGDIR_SHIFT 39 +#define PTRS_PER_PGD 512 + +/* + * 3rd level page + */ +#define PUD_SHIFT 30 +#define PTRS_PER_PUD 512 + +/* + * PMD_SHIFT determines the size of the area a middle-level + * page table can map + */ +#define PMD_SHIFT 21 +#define PTRS_PER_PMD 512 + +/* + * entries per page directory level + */ +#define PTRS_PER_PTE 512 + +#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT) +#define PMD_MASK (~(PMD_SIZE - 1)) +#define PUD_SIZE (_AC(1, UL) << PUD_SHIFT) +#define PUD_MASK (~(PUD_SIZE - 1)) +#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT) +#define PGDIR_MASK (~(PGDIR_SIZE - 1)) + + +#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) +#define VMALLOC_START _AC(0xffffc20000000000, UL) +#define VMALLOC_END _AC(0xffffe1ffffffffff, UL) +#define VMEMMAP_START _AC(0xffffe20000000000, UL) +#define MODULES_VADDR _AC(0xffffffffa0000000, UL) +#define MODULES_END _AC(0xffffffffff000000, UL) +#define MODULES_LEN (MODULES_END - MODULES_VADDR) + +#endif /* _ASM_X86_PGTABLE_64_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/pgtable_types.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/pgtable_types.h @@ -0,0 +1,334 @@ +#ifndef _ASM_X86_PGTABLE_DEFS_H +#define _ASM_X86_PGTABLE_DEFS_H + +#include +#include + +#define FIRST_USER_ADDRESS 0 + +#define _PAGE_BIT_PRESENT 0 /* is present */ +#define _PAGE_BIT_RW 1 /* writeable */ +#define _PAGE_BIT_USER 2 /* userspace addressable */ +#define _PAGE_BIT_PWT 3 /* page write through */ +#define _PAGE_BIT_PCD 4 /* page cache disabled */ +#define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */ +#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */ +#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ +#define _PAGE_BIT_PAT 7 /* on 4KB pages */ +#define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ +#define _PAGE_BIT_UNUSED1 9 /* available for programmer */ +#define _PAGE_BIT_IOMAP 10 /* flag used to indicate IO mapping */ +#define _PAGE_BIT_HIDDEN 11 /* hidden by kmemcheck */ +#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ +#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1 +#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1 +#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ + +/* If _PAGE_BIT_PRESENT is clear, we use these: */ +/* - if the user mapped it with PROT_NONE; pte_present gives true */ +#define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL +/* - set: nonlinear file mapping, saved PTE; unset:swap */ +#define _PAGE_BIT_FILE _PAGE_BIT_DIRTY + +#define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) +#define _PAGE_RW (_AT(pteval_t, 1) << _PAGE_BIT_RW) +#define _PAGE_USER (_AT(pteval_t, 1) << _PAGE_BIT_USER) +#define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT) +#define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD) +#define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED) +#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) +#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) +#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) +#define _PAGE_UNUSED1 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1) +#define _PAGE_IOMAP (_AT(pteval_t, 1) << _PAGE_BIT_IOMAP) +#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT) +#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) +#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL) +#define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST) +#define __HAVE_ARCH_PTE_SPECIAL + +#ifdef CONFIG_KMEMCHECK +#define _PAGE_HIDDEN (_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN) +#else +#define _PAGE_HIDDEN (_AT(pteval_t, 0)) +#endif + +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) +#define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) +#else +#define _PAGE_NX (_AT(pteval_t, 0)) +#endif + +#define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE) +#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) + +#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ + _PAGE_ACCESSED | _PAGE_DIRTY) +#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ + _PAGE_DIRTY) + +/* Set of bits not changed in pte_modify */ +#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY) + +#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) +#define _PAGE_CACHE_WB (0) +#define _PAGE_CACHE_WC (_PAGE_PWT) +#define _PAGE_CACHE_UC_MINUS (_PAGE_PCD) +#define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT) + +#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED) +#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ + _PAGE_ACCESSED | _PAGE_NX) + +#define PAGE_SHARED_EXEC __pgprot(_PAGE_PRESENT | _PAGE_RW | \ + _PAGE_USER | _PAGE_ACCESSED) +#define PAGE_COPY_NOEXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ + _PAGE_ACCESSED | _PAGE_NX) +#define PAGE_COPY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ + _PAGE_ACCESSED) +#define PAGE_COPY PAGE_COPY_NOEXEC +#define PAGE_READONLY __pgprot(_PAGE_PRESENT | _PAGE_USER | \ + _PAGE_ACCESSED | _PAGE_NX) +#define PAGE_READONLY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ + _PAGE_ACCESSED) + +#define __PAGE_KERNEL_EXEC \ + (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL) +#define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX) + +#define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW) +#define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW) +#define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT) +#define __PAGE_KERNEL_WC (__PAGE_KERNEL | _PAGE_CACHE_WC) +#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT) +#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD) +#define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER) +#define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT) +#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE) +#define __PAGE_KERNEL_LARGE_NOCACHE (__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE) +#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE) + +#define __PAGE_KERNEL_IO (__PAGE_KERNEL | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_UC_MINUS (__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_WC (__PAGE_KERNEL_WC | _PAGE_IOMAP) + +#define PAGE_KERNEL __pgprot(__PAGE_KERNEL) +#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO) +#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) +#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX) +#define PAGE_KERNEL_WC __pgprot(__PAGE_KERNEL_WC) +#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE) +#define PAGE_KERNEL_UC_MINUS __pgprot(__PAGE_KERNEL_UC_MINUS) +#define PAGE_KERNEL_EXEC_NOCACHE __pgprot(__PAGE_KERNEL_EXEC_NOCACHE) +#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE) +#define PAGE_KERNEL_LARGE_NOCACHE __pgprot(__PAGE_KERNEL_LARGE_NOCACHE) +#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC) +#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL) +#define PAGE_KERNEL_VSYSCALL_NOCACHE __pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE) + +#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO) +#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE) +#define PAGE_KERNEL_IO_UC_MINUS __pgprot(__PAGE_KERNEL_IO_UC_MINUS) +#define PAGE_KERNEL_IO_WC __pgprot(__PAGE_KERNEL_IO_WC) + +/* xwr */ +#define __P000 PAGE_NONE +#define __P001 PAGE_READONLY +#define __P010 PAGE_COPY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_EXEC +#define __P101 PAGE_READONLY_EXEC +#define __P110 PAGE_COPY_EXEC +#define __P111 PAGE_COPY_EXEC + +#define __S000 PAGE_NONE +#define __S001 PAGE_READONLY +#define __S010 PAGE_SHARED +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_EXEC +#define __S101 PAGE_READONLY_EXEC +#define __S110 PAGE_SHARED_EXEC +#define __S111 PAGE_SHARED_EXEC + +/* + * early identity mapping pte attrib macros. + */ +#ifdef CONFIG_X86_64 +#define __PAGE_KERNEL_IDENT_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC +#else +/* + * For PDE_IDENT_ATTR include USER bit. As the PDE and PTE protection + * bits are combined, this will alow user to access the high address mapped + * VDSO in the presence of CONFIG_COMPAT_VDSO + */ +#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */ +#define PDE_IDENT_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */ +#define PGD_IDENT_ATTR 0x001 /* PRESENT (no other attributes) */ +#endif + +#ifdef CONFIG_X86_32 +# include "pgtable_32_types.h" +#else +# include "pgtable_64_types.h" +#endif + +#ifndef __ASSEMBLY__ + +#include + +/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */ +#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK) + +/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */ +#define PTE_FLAGS_MASK (~PTE_PFN_MASK) + +typedef struct pgprot { pgprotval_t pgprot; } pgprot_t; + +typedef struct { pgdval_t pgd; } pgd_t; + +static inline pgd_t native_make_pgd(pgdval_t val) +{ + return (pgd_t) { val }; +} + +static inline pgdval_t native_pgd_val(pgd_t pgd) +{ + return pgd.pgd; +} + +static inline pgdval_t pgd_flags(pgd_t pgd) +{ + return native_pgd_val(pgd) & PTE_FLAGS_MASK; +} + +#if PAGETABLE_LEVELS > 3 +typedef struct { pudval_t pud; } pud_t; + +static inline pud_t native_make_pud(pmdval_t val) +{ + return (pud_t) { val }; +} + +static inline pudval_t native_pud_val(pud_t pud) +{ + return pud.pud; +} +#else +#include + +static inline pudval_t native_pud_val(pud_t pud) +{ + return native_pgd_val(pud.pgd); +} +#endif + +#if PAGETABLE_LEVELS > 2 +typedef struct { pmdval_t pmd; } pmd_t; + +static inline pmd_t native_make_pmd(pmdval_t val) +{ + return (pmd_t) { val }; +} + +static inline pmdval_t native_pmd_val(pmd_t pmd) +{ + return pmd.pmd; +} +#else +#include + +static inline pmdval_t native_pmd_val(pmd_t pmd) +{ + return native_pgd_val(pmd.pud.pgd); +} +#endif + +static inline pudval_t pud_flags(pud_t pud) +{ + return native_pud_val(pud) & PTE_FLAGS_MASK; +} + +static inline pmdval_t pmd_flags(pmd_t pmd) +{ + return native_pmd_val(pmd) & PTE_FLAGS_MASK; +} + +static inline pte_t native_make_pte(pteval_t val) +{ + return (pte_t) { .pte = val }; +} + +static inline pteval_t native_pte_val(pte_t pte) +{ + return pte.pte; +} + +static inline pteval_t pte_flags(pte_t pte) +{ + return native_pte_val(pte) & PTE_FLAGS_MASK; +} + +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) } ) + + +typedef struct page *pgtable_t; + +extern pteval_t __supported_pte_mask; +extern int nx_enabled; +extern void set_nx(void); + +#define pgprot_writecombine pgprot_writecombine +extern pgprot_t pgprot_writecombine(pgprot_t prot); + +/* Indicate that x86 has its own track and untrack pfn vma functions */ +#define __HAVE_PFNMAP_TRACKING + +#define __HAVE_PHYS_MEM_ACCESS_PROT +struct file; +pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, + unsigned long size, pgprot_t vma_prot); +int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, + unsigned long size, pgprot_t *vma_prot); + +/* Install a pte for a particular vaddr in kernel space. */ +void set_pte_vaddr(unsigned long vaddr, pte_t pte); + +#ifdef CONFIG_X86_32 +extern void native_pagetable_setup_start(pgd_t *base); +extern void native_pagetable_setup_done(pgd_t *base); +#else +static inline void native_pagetable_setup_start(pgd_t *base) {} +static inline void native_pagetable_setup_done(pgd_t *base) {} +#endif + +struct seq_file; +extern void arch_report_meminfo(struct seq_file *m); + +enum { + PG_LEVEL_NONE, + PG_LEVEL_4K, + PG_LEVEL_2M, + PG_LEVEL_1G, + PG_LEVEL_NUM +}; + +#ifdef CONFIG_PROC_FS +extern void update_page_count(int level, unsigned long pages); +#else +static inline void update_page_count(int level, unsigned long pages) { } +#endif + +/* + * Helper function that returns the kernel pagetable entry controlling + * the virtual address 'address'. NULL means no pagetable entry present. + * NOTE: the return type is pte_t but if the pmd is PSE then we return it + * as a pte too. + */ +extern pte_t *lookup_address(unsigned long address, unsigned int *level); + +#endif /* !__ASSEMBLY__ */ + +#endif /* _ASM_X86_PGTABLE_DEFS_H */ Index: linux-2.6-tip/arch/x86/include/asm/prctl.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/prctl.h +++ linux-2.6-tip/arch/x86/include/asm/prctl.h @@ -6,8 +6,4 @@ #define ARCH_GET_FS 0x1003 #define ARCH_GET_GS 0x1004 -#ifdef CONFIG_X86_64 -extern long sys_arch_prctl(int, unsigned long); -#endif /* CONFIG_X86_64 */ - #endif /* _ASM_X86_PRCTL_H */ Index: linux-2.6-tip/arch/x86/include/asm/processor.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/processor.h +++ linux-2.6-tip/arch/x86/include/asm/processor.h @@ -16,6 +16,7 @@ struct mm_struct; #include #include #include +#include #include #include #include @@ -73,10 +74,10 @@ struct cpuinfo_x86 { char pad0; #else /* Number of 4K pages in DTLB/ITLB combined(in pages): */ - int x86_tlbsize; + int x86_tlbsize; +#endif __u8 x86_virt_bits; __u8 x86_phys_bits; -#endif /* CPUID returned core id bits: */ __u8 x86_coreid_bits; /* Max extended CPUID function supported: */ @@ -93,7 +94,7 @@ struct cpuinfo_x86 { unsigned long loops_per_jiffy; #ifdef CONFIG_SMP /* cpus sharing the last level cache: */ - cpumask_t llc_shared_map; + cpumask_var_t llc_shared_map; #endif /* cpuid returned max cores value: */ u16 x86_max_cores; @@ -247,7 +248,6 @@ struct x86_hw_tss { #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) #define IO_BITMAP_OFFSET offsetof(struct tss_struct, io_bitmap) #define INVALID_IO_BITMAP_OFFSET 0x8000 -#define INVALID_IO_BITMAP_OFFSET_LAZY 0x9000 struct tss_struct { /* @@ -262,11 +262,6 @@ struct tss_struct { * be within the limit. */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; - /* - * Cache the current maximum and the last task that used the bitmap: - */ - unsigned long io_bitmap_max; - struct thread_struct *io_bitmap_owner; /* * .. and then another 0x100 bytes for the emergency kernel stack: @@ -378,9 +373,33 @@ union thread_xstate { #ifdef CONFIG_X86_64 DECLARE_PER_CPU(struct orig_ist, orig_ist); + +union irq_stack_union { + char irq_stack[IRQ_STACK_SIZE]; + /* + * GCC hardcodes the stack canary as %gs:40. Since the + * irq_stack is the object at %gs:0, we reserve the bottom + * 48 bytes of the irq stack for the canary. + */ + struct { + char gs_base[40]; + unsigned long stack_canary; + }; +}; + +DECLARE_PER_CPU(union irq_stack_union, irq_stack_union); +DECLARE_INIT_PER_CPU(irq_stack_union); + +DECLARE_PER_CPU(char *, irq_stack_ptr); +DECLARE_PER_CPU(unsigned int, irq_count); +extern unsigned long kernel_eflags; +extern asmlinkage void ignore_sysret(void); +#else /* X86_64 */ +#ifdef CONFIG_CC_STACKPROTECTOR +DECLARE_PER_CPU(unsigned long, stack_canary); #endif +#endif /* X86_64 */ -extern void print_cpu_info(struct cpuinfo_x86 *); extern unsigned int xstate_size; extern void free_thread_xstate(struct task_struct *); extern struct kmem_cache *task_xstate_cachep; @@ -717,6 +736,7 @@ static inline void __sti_mwait(unsigned extern void mwait_idle_with_hints(unsigned long eax, unsigned long ecx); extern void select_idle_routine(const struct cpuinfo_x86 *c); +extern void init_c1e_mask(void); extern unsigned long boot_option_idle_override; extern unsigned long idle_halt; @@ -752,9 +772,9 @@ extern int sysenter_setup(void); extern struct desc_ptr early_gdt_descr; extern void cpu_set_gdt(int); -extern void switch_to_new_gdt(void); +extern void switch_to_new_gdt(int); +extern void load_percpu_segment(int); extern void cpu_init(void); -extern void init_gdt(int cpu); static inline unsigned long get_debugctlmsr(void) { @@ -839,6 +859,7 @@ static inline void spin_lock_prefetch(co * User space process size: 3GB (default). */ #define TASK_SIZE PAGE_OFFSET +#define TASK_SIZE_MAX TASK_SIZE #define STACK_TOP TASK_SIZE #define STACK_TOP_MAX STACK_TOP @@ -898,7 +919,7 @@ extern unsigned long thread_saved_pc(str /* * User space process size. 47bits minus one guard page. */ -#define TASK_SIZE64 ((1UL << 47) - PAGE_SIZE) +#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) /* This decides where the kernel will search for a free chunk of vm * space during mmap's. @@ -907,12 +928,12 @@ extern unsigned long thread_saved_pc(str 0xc0000000 : 0xFFFFe000) #define TASK_SIZE (test_thread_flag(TIF_IA32) ? \ - IA32_PAGE_OFFSET : TASK_SIZE64) + IA32_PAGE_OFFSET : TASK_SIZE_MAX) #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_IA32)) ? \ - IA32_PAGE_OFFSET : TASK_SIZE64) + IA32_PAGE_OFFSET : TASK_SIZE_MAX) #define STACK_TOP TASK_SIZE -#define STACK_TOP_MAX TASK_SIZE64 +#define STACK_TOP_MAX TASK_SIZE_MAX #define INIT_THREAD { \ .sp0 = (unsigned long)&init_stack + sizeof(init_stack) \ Index: linux-2.6-tip/arch/x86/include/asm/proto.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/proto.h +++ linux-2.6-tip/arch/x86/include/asm/proto.h @@ -18,11 +18,7 @@ extern void syscall32_cpu_init(void); extern void check_efer(void); -#ifdef CONFIG_X86_BIOS_REBOOT extern int reboot_force; -#else -static const int reboot_force = 0; -#endif long do_arch_prctl(struct task_struct *task, int code, unsigned long addr); Index: linux-2.6-tip/arch/x86/include/asm/ptrace-abi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/ptrace-abi.h +++ linux-2.6-tip/arch/x86/include/asm/ptrace-abi.h @@ -80,8 +80,6 @@ #define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */ -#ifdef CONFIG_X86_PTRACE_BTS - #ifndef __ASSEMBLY__ #include @@ -140,6 +138,5 @@ struct ptrace_bts_config { BTS records are read from oldest to newest. Returns number of BTS records drained. */ -#endif /* CONFIG_X86_PTRACE_BTS */ #endif /* _ASM_X86_PTRACE_ABI_H */ Index: linux-2.6-tip/arch/x86/include/asm/ptrace.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/ptrace.h +++ linux-2.6-tip/arch/x86/include/asm/ptrace.h @@ -28,7 +28,7 @@ struct pt_regs { int xds; int xes; int xfs; - /* int gs; */ + int xgs; long orig_eax; long eip; int xcs; @@ -50,7 +50,7 @@ struct pt_regs { unsigned long ds; unsigned long es; unsigned long fs; - /* int gs; */ + unsigned long gs; unsigned long orig_ax; unsigned long ip; unsigned long cs; Index: linux-2.6-tip/arch/x86/include/asm/rdc321x_defs.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/rdc321x_defs.h @@ -0,0 +1,12 @@ +#define PFX "rdc321x: " + +/* General purpose configuration and data registers */ +#define RDC3210_CFGREG_ADDR 0x0CF8 +#define RDC3210_CFGREG_DATA 0x0CFC + +#define RDC321X_GPIO_CTRL_REG1 0x48 +#define RDC321X_GPIO_CTRL_REG2 0x84 +#define RDC321X_GPIO_DATA_REG1 0x4c +#define RDC321X_GPIO_DATA_REG2 0x88 + +#define RDC321X_MAX_GPIO 58 Index: linux-2.6-tip/arch/x86/include/asm/sections.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/sections.h +++ linux-2.6-tip/arch/x86/include/asm/sections.h @@ -1 +1,8 @@ +#ifndef _ASM_X86_SECTIONS_H +#define _ASM_X86_SECTIONS_H + #include + +extern char __brk_base[], __brk_limit[]; + +#endif /* _ASM_X86_SECTIONS_H */ Index: linux-2.6-tip/arch/x86/include/asm/segment.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/segment.h +++ linux-2.6-tip/arch/x86/include/asm/segment.h @@ -61,7 +61,7 @@ * * 26 - ESPFIX small SS * 27 - per-cpu [ offset to per-cpu data area ] - * 28 - unused + * 28 - stack_canary-20 [ for stack protector ] * 29 - unused * 30 - unused * 31 - TSS for double fault handler @@ -95,6 +95,13 @@ #define __KERNEL_PERCPU 0 #endif +#define GDT_ENTRY_STACK_CANARY (GDT_ENTRY_KERNEL_BASE + 16) +#ifdef CONFIG_CC_STACKPROTECTOR +#define __KERNEL_STACK_CANARY (GDT_ENTRY_STACK_CANARY * 8) +#else +#define __KERNEL_STACK_CANARY 0 +#endif + #define GDT_ENTRY_DOUBLEFAULT_TSS 31 /* Index: linux-2.6-tip/arch/x86/include/asm/setup.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/setup.h +++ linux-2.6-tip/arch/x86/include/asm/setup.h @@ -1,33 +1,19 @@ #ifndef _ASM_X86_SETUP_H #define _ASM_X86_SETUP_H +#ifdef __KERNEL__ + #define COMMAND_LINE_SIZE 2048 #ifndef __ASSEMBLY__ -/* Interrupt control for vSMPowered x86_64 systems */ -void vsmp_init(void); - - -void setup_bios_corruption_check(void); - - -#ifdef CONFIG_X86_VISWS -extern void visws_early_detect(void); -extern int is_visws_box(void); -#else -static inline void visws_early_detect(void) { } -static inline int is_visws_box(void) { return 0; } -#endif - -extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip); -extern int wakeup_secondary_cpu_via_init(int apicid, unsigned long start_eip); /* * Any setup quirks to be performed? */ struct mpc_cpu; struct mpc_bus; struct mpc_oemtable; + struct x86_quirks { int (*arch_pre_time_init)(void); int (*arch_time_init)(void); @@ -43,20 +29,19 @@ struct x86_quirks { void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name); void (*mpc_oem_pci_bus)(struct mpc_bus *m); void (*smp_read_mpc_oem)(struct mpc_oemtable *oemtable, - unsigned short oemsize); + unsigned short oemsize); int (*setup_ioapic_ids)(void); - int (*update_genapic)(void); }; -extern struct x86_quirks *x86_quirks; -extern unsigned long saved_video_mode; +extern void x86_quirk_pre_intr_init(void); +extern void x86_quirk_intr_init(void); -#ifndef CONFIG_PARAVIRT -#define paravirt_post_allocator_init() do {} while (0) -#endif -#endif /* __ASSEMBLY__ */ +extern void x86_quirk_trap_init(void); -#ifdef __KERNEL__ +extern void x86_quirk_pre_time_init(void); +extern void x86_quirk_time_init(void); + +#endif /* __ASSEMBLY__ */ #ifdef __i386__ @@ -78,6 +63,30 @@ extern unsigned long saved_video_mode; #ifndef __ASSEMBLY__ #include +/* Interrupt control for vSMPowered x86_64 systems */ +#ifdef CONFIG_X86_VSMP +void vsmp_init(void); +#else +static inline void vsmp_init(void) { } +#endif + +void setup_bios_corruption_check(void); + +#ifdef CONFIG_X86_VISWS +extern void visws_early_detect(void); +extern int is_visws_box(void); +#else +static inline void visws_early_detect(void) { } +static inline int is_visws_box(void) { return 0; } +#endif + +extern struct x86_quirks *x86_quirks; +extern unsigned long saved_video_mode; + +#ifndef CONFIG_PARAVIRT +#define paravirt_post_allocator_init() do {} while (0) +#endif + #ifndef _SETUP /* @@ -91,21 +100,51 @@ extern struct boot_params boot_params; */ #define LOWMEMSIZE() (0x9f000) +/* exceedingly early brk-like allocator */ +extern unsigned long _brk_end; +void *extend_brk(size_t size, size_t align); + +/* + * Reserve space in the brk section. The name must be unique within + * the file, and somewhat descriptive. The size is in bytes. Must be + * used at file scope. + * + * (This uses a temp function to wrap the asm so we can pass it the + * size parameter; otherwise we wouldn't be able to. We can't use a + * "section" attribute on a normal variable because it always ends up + * being @progbits, which ends up allocating space in the vmlinux + * executable.) + */ +#define RESERVE_BRK(name,sz) \ + static void __section(.discard) __used \ + __brk_reservation_fn_##name##__(void) { \ + asm volatile ( \ + ".pushsection .brk_reservation,\"aw\",@nobits;" \ + ".brk." #name ":" \ + " 1:.skip %c0;" \ + " .size .brk." #name ", . - 1b;" \ + " .popsection" \ + : : "i" (sz)); \ + } + #ifdef __i386__ void __init i386_start_kernel(void); extern void probe_roms(void); -extern unsigned long init_pg_tables_start; -extern unsigned long init_pg_tables_end; - #else -void __init x86_64_init_pda(void); void __init x86_64_start_kernel(char *real_mode); void __init x86_64_start_reservations(char *real_mode_data); #endif /* __i386__ */ #endif /* _SETUP */ +#else +#define RESERVE_BRK(name,sz) \ + .pushsection .brk_reservation,"aw",@nobits; \ +.brk.name: \ +1: .skip sz; \ + .size .brk.name,.-1b; \ + .popsection #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ Index: linux-2.6-tip/arch/x86/include/asm/setup_arch.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/setup_arch.h @@ -0,0 +1,3 @@ +/* Hook to call BIOS initialisation function */ + +/* no action for generic */ Index: linux-2.6-tip/arch/x86/include/asm/smp.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/smp.h +++ linux-2.6-tip/arch/x86/include/asm/smp.h @@ -15,53 +15,25 @@ # include # endif #endif -#include #include - -#ifdef CONFIG_X86_64 - -extern cpumask_var_t cpu_callin_mask; -extern cpumask_var_t cpu_callout_mask; -extern cpumask_var_t cpu_initialized_mask; -extern cpumask_var_t cpu_sibling_setup_mask; - -#else /* CONFIG_X86_32 */ - -extern cpumask_t cpu_callin_map; -extern cpumask_t cpu_callout_map; -extern cpumask_t cpu_initialized; -extern cpumask_t cpu_sibling_setup_map; - -#define cpu_callin_mask ((struct cpumask *)&cpu_callin_map) -#define cpu_callout_mask ((struct cpumask *)&cpu_callout_map) -#define cpu_initialized_mask ((struct cpumask *)&cpu_initialized) -#define cpu_sibling_setup_mask ((struct cpumask *)&cpu_sibling_setup_map) - -#endif /* CONFIG_X86_32 */ - -extern void (*mtrr_hook)(void); -extern void zap_low_mappings(void); - -extern int __cpuinit get_local_pda(int cpu); +#include extern int smp_num_siblings; extern unsigned int num_processors; -DECLARE_PER_CPU(cpumask_t, cpu_sibling_map); -DECLARE_PER_CPU(cpumask_t, cpu_core_map); +DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map); +DECLARE_PER_CPU(cpumask_var_t, cpu_core_map); DECLARE_PER_CPU(u16, cpu_llc_id); -#ifdef CONFIG_X86_32 DECLARE_PER_CPU(int, cpu_number); -#endif static inline struct cpumask *cpu_sibling_mask(int cpu) { - return &per_cpu(cpu_sibling_map, cpu); + return per_cpu(cpu_sibling_map, cpu); } static inline struct cpumask *cpu_core_mask(int cpu) { - return &per_cpu(cpu_core_map, cpu); + return per_cpu(cpu_core_map, cpu); } DECLARE_EARLY_PER_CPU(u16, x86_cpu_to_apicid); @@ -149,9 +121,10 @@ static inline void arch_send_call_functi smp_ops.send_call_func_single_ipi(cpu); } -static inline void arch_send_call_function_ipi(cpumask_t mask) +#define arch_send_call_function_ipi_mask arch_send_call_function_ipi_mask +static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask) { - smp_ops.send_call_func_ipi(&mask); + smp_ops.send_call_func_ipi(mask); } void cpu_disable_common(void); @@ -167,8 +140,6 @@ void play_dead_common(void); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); -extern void prefill_possible_map(void); - void smp_store_cpu_info(int id); #define cpu_physical_id(cpu) per_cpu(x86_cpu_to_apicid, cpu) @@ -177,10 +148,6 @@ static inline int num_booting_cpus(void) { return cpumask_weight(cpu_callout_mask); } -#else -static inline void prefill_possible_map(void) -{ -} #endif /* CONFIG_SMP */ extern unsigned disabled_cpus __cpuinitdata; @@ -191,11 +158,11 @@ extern unsigned disabled_cpus __cpuinitd * from the initial startup. We map APIC_BASE very early in page_setup(), * so this is correct in the x86 case. */ -#define raw_smp_processor_id() (x86_read_percpu(cpu_number)) +#define raw_smp_processor_id() (percpu_read(cpu_number)) extern int safe_smp_processor_id(void); #elif defined(CONFIG_X86_64_SMP) -#define raw_smp_processor_id() read_pda(cpunumber) +#define raw_smp_processor_id() (percpu_read(cpu_number)) #define stack_smp_processor_id() \ ({ \ @@ -205,10 +172,6 @@ extern int safe_smp_processor_id(void); }) #define safe_smp_processor_id() smp_processor_id() -#else /* !CONFIG_X86_32_SMP && !CONFIG_X86_64_SMP */ -#define cpu_physical_id(cpu) boot_cpu_physical_apicid -#define safe_smp_processor_id() 0 -#define stack_smp_processor_id() 0 #endif #ifdef CONFIG_X86_LOCAL_APIC @@ -220,28 +183,9 @@ static inline int logical_smp_processor_ return GET_APIC_LOGICAL_ID(*(u32 *)(APIC_BASE + APIC_LDR)); } -#include -static inline unsigned int read_apic_id(void) -{ - unsigned int reg; - - reg = *(u32 *)(APIC_BASE + APIC_ID); - - return GET_APIC_ID(reg); -} #endif - -# if defined(APIC_DEFINITION) || defined(CONFIG_X86_64) extern int hard_smp_processor_id(void); -# else -#include -static inline int hard_smp_processor_id(void) -{ - /* we don't want to mark this access volatile - bad code generation */ - return read_apic_id(); -} -# endif /* APIC_DEFINITION */ #else /* CONFIG_X86_LOCAL_APIC */ @@ -251,11 +195,5 @@ static inline int hard_smp_processor_id( #endif /* CONFIG_X86_LOCAL_APIC */ -#ifdef CONFIG_X86_HAS_BOOT_CPU_ID -extern unsigned char boot_cpu_id; -#else -#define boot_cpu_id 0 -#endif - #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_SMP_H */ Index: linux-2.6-tip/arch/x86/include/asm/smpboot_hooks.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/smpboot_hooks.h @@ -0,0 +1,61 @@ +/* two abstractions specific to kernel/smpboot.c, mainly to cater to visws + * which needs to alter them. */ + +static inline void smpboot_clear_io_apic_irqs(void) +{ +#ifdef CONFIG_X86_IO_APIC + io_apic_irqs = 0; +#endif +} + +static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip) +{ + CMOS_WRITE(0xa, 0xf); + local_flush_tlb(); + pr_debug("1.\n"); + *((volatile unsigned short *)phys_to_virt(apic->trampoline_phys_high)) = + start_eip >> 4; + pr_debug("2.\n"); + *((volatile unsigned short *)phys_to_virt(apic->trampoline_phys_low)) = + start_eip & 0xf; + pr_debug("3.\n"); +} + +static inline void smpboot_restore_warm_reset_vector(void) +{ + /* + * Install writable page 0 entry to set BIOS data area. + */ + local_flush_tlb(); + + /* + * Paranoid: Set warm reset code and vector here back + * to default values. + */ + CMOS_WRITE(0, 0xf); + + *((volatile long *)phys_to_virt(apic->trampoline_phys_low)) = 0; +} + +static inline void __init smpboot_setup_io_apic(void) +{ +#ifdef CONFIG_X86_IO_APIC + /* + * Here we can be sure that there is an IO-APIC in the system. Let's + * go and set it up: + */ + if (!skip_ioapic_setup && nr_ioapics) + setup_IO_APIC(); + else { + nr_ioapics = 0; + localise_nmi_watchdog(); + } +#endif +} + +static inline void smpboot_clear_io_apic(void) +{ +#ifdef CONFIG_X86_IO_APIC + nr_ioapics = 0; +#endif +} Index: linux-2.6-tip/arch/x86/include/asm/spinlock.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/spinlock.h +++ linux-2.6-tip/arch/x86/include/asm/spinlock.h @@ -58,7 +58,7 @@ #if (NR_CPUS < 256) #define TICKET_SHIFT 8 -static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock) +static __always_inline void __ticket_spin_lock(__raw_spinlock_t *lock) { short inc = 0x0100; @@ -77,7 +77,7 @@ static __always_inline void __ticket_spi : "memory", "cc"); } -static __always_inline int __ticket_spin_trylock(raw_spinlock_t *lock) +static __always_inline int __ticket_spin_trylock(__raw_spinlock_t *lock) { int tmp, new; @@ -96,7 +96,7 @@ static __always_inline int __ticket_spin return tmp; } -static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock) +static __always_inline void __ticket_spin_unlock(__raw_spinlock_t *lock) { asm volatile(UNLOCK_LOCK_PREFIX "incb %0" : "+m" (lock->slock) @@ -106,7 +106,7 @@ static __always_inline void __ticket_spi #else #define TICKET_SHIFT 16 -static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock) +static __always_inline void __ticket_spin_lock(__raw_spinlock_t *lock) { int inc = 0x00010000; int tmp; @@ -127,7 +127,7 @@ static __always_inline void __ticket_spi : "memory", "cc"); } -static __always_inline int __ticket_spin_trylock(raw_spinlock_t *lock) +static __always_inline int __ticket_spin_trylock(__raw_spinlock_t *lock) { int tmp; int new; @@ -149,7 +149,7 @@ static __always_inline int __ticket_spin return tmp; } -static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock) +static __always_inline void __ticket_spin_unlock(__raw_spinlock_t *lock) { asm volatile(UNLOCK_LOCK_PREFIX "incw %0" : "+m" (lock->slock) @@ -158,119 +158,57 @@ static __always_inline void __ticket_spi } #endif -static inline int __ticket_spin_is_locked(raw_spinlock_t *lock) +static inline int __ticket_spin_is_locked(__raw_spinlock_t *lock) { int tmp = ACCESS_ONCE(lock->slock); return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1)); } -static inline int __ticket_spin_is_contended(raw_spinlock_t *lock) +static inline int __ticket_spin_is_contended(__raw_spinlock_t *lock) { int tmp = ACCESS_ONCE(lock->slock); return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1; } -#ifdef CONFIG_PARAVIRT -/* - * Define virtualization-friendly old-style lock byte lock, for use in - * pv_lock_ops if desired. - * - * This differs from the pre-2.6.24 spinlock by always using xchgb - * rather than decb to take the lock; this allows it to use a - * zero-initialized lock structure. It also maintains a 1-byte - * contention counter, so that we can implement - * __byte_spin_is_contended. - */ -struct __byte_spinlock { - s8 lock; - s8 spinners; -}; - -static inline int __byte_spin_is_locked(raw_spinlock_t *lock) -{ - struct __byte_spinlock *bl = (struct __byte_spinlock *)lock; - return bl->lock != 0; -} +#ifndef CONFIG_PARAVIRT -static inline int __byte_spin_is_contended(raw_spinlock_t *lock) -{ - struct __byte_spinlock *bl = (struct __byte_spinlock *)lock; - return bl->spinners != 0; -} - -static inline void __byte_spin_lock(raw_spinlock_t *lock) -{ - struct __byte_spinlock *bl = (struct __byte_spinlock *)lock; - s8 val = 1; - - asm("1: xchgb %1, %0\n" - " test %1,%1\n" - " jz 3f\n" - " " LOCK_PREFIX "incb %2\n" - "2: rep;nop\n" - " cmpb $1, %0\n" - " je 2b\n" - " " LOCK_PREFIX "decb %2\n" - " jmp 1b\n" - "3:" - : "+m" (bl->lock), "+q" (val), "+m" (bl->spinners): : "memory"); -} - -static inline int __byte_spin_trylock(raw_spinlock_t *lock) -{ - struct __byte_spinlock *bl = (struct __byte_spinlock *)lock; - u8 old = 1; - - asm("xchgb %1,%0" - : "+m" (bl->lock), "+q" (old) : : "memory"); - - return old == 0; -} - -static inline void __byte_spin_unlock(raw_spinlock_t *lock) -{ - struct __byte_spinlock *bl = (struct __byte_spinlock *)lock; - smp_wmb(); - bl->lock = 0; -} -#else /* !CONFIG_PARAVIRT */ -static inline int __raw_spin_is_locked(raw_spinlock_t *lock) +static inline int __raw_spin_is_locked(__raw_spinlock_t *lock) { return __ticket_spin_is_locked(lock); } -static inline int __raw_spin_is_contended(raw_spinlock_t *lock) +static inline int __raw_spin_is_contended(__raw_spinlock_t *lock) { return __ticket_spin_is_contended(lock); } #define __raw_spin_is_contended __raw_spin_is_contended -static __always_inline void __raw_spin_lock(raw_spinlock_t *lock) +static __always_inline void __raw_spin_lock(__raw_spinlock_t *lock) { __ticket_spin_lock(lock); } -static __always_inline int __raw_spin_trylock(raw_spinlock_t *lock) +static __always_inline int __raw_spin_trylock(__raw_spinlock_t *lock) { return __ticket_spin_trylock(lock); } -static __always_inline void __raw_spin_unlock(raw_spinlock_t *lock) +static __always_inline void __raw_spin_unlock(__raw_spinlock_t *lock) { __ticket_spin_unlock(lock); } -static __always_inline void __raw_spin_lock_flags(raw_spinlock_t *lock, +static __always_inline void __raw_spin_lock_flags(__raw_spinlock_t *lock, unsigned long flags) { __raw_spin_lock(lock); } -#endif /* CONFIG_PARAVIRT */ +#endif -static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock) +static inline void __raw_spin_unlock_wait(__raw_spinlock_t *lock) { while (__raw_spin_is_locked(lock)) cpu_relax(); @@ -294,7 +232,7 @@ static inline void __raw_spin_unlock_wai * read_can_lock - would read_trylock() succeed? * @lock: the rwlock in question. */ -static inline int __raw_read_can_lock(raw_rwlock_t *lock) +static inline int __raw_read_can_lock(__raw_rwlock_t *lock) { return (int)(lock)->lock > 0; } @@ -303,12 +241,12 @@ static inline int __raw_read_can_lock(ra * write_can_lock - would write_trylock() succeed? * @lock: the rwlock in question. */ -static inline int __raw_write_can_lock(raw_rwlock_t *lock) +static inline int __raw_write_can_lock(__raw_rwlock_t *lock) { return (lock)->lock == RW_LOCK_BIAS; } -static inline void __raw_read_lock(raw_rwlock_t *rw) +static inline void __raw_read_lock(__raw_rwlock_t *rw) { asm volatile(LOCK_PREFIX " subl $1,(%0)\n\t" "jns 1f\n" @@ -317,7 +255,7 @@ static inline void __raw_read_lock(raw_r ::LOCK_PTR_REG (rw) : "memory"); } -static inline void __raw_write_lock(raw_rwlock_t *rw) +static inline void __raw_write_lock(__raw_rwlock_t *rw) { asm volatile(LOCK_PREFIX " subl %1,(%0)\n\t" "jz 1f\n" @@ -326,18 +264,17 @@ static inline void __raw_write_lock(raw_ ::LOCK_PTR_REG (rw), "i" (RW_LOCK_BIAS) : "memory"); } -static inline int __raw_read_trylock(raw_rwlock_t *lock) +static inline int __raw_read_trylock(__raw_rwlock_t *lock) { atomic_t *count = (atomic_t *)lock; - atomic_dec(count); - if (atomic_read(count) >= 0) + if (atomic_dec_return(count) >= 0) return 1; atomic_inc(count); return 0; } -static inline int __raw_write_trylock(raw_rwlock_t *lock) +static inline int __raw_write_trylock(__raw_rwlock_t *lock) { atomic_t *count = (atomic_t *)lock; @@ -347,19 +284,19 @@ static inline int __raw_write_trylock(ra return 0; } -static inline void __raw_read_unlock(raw_rwlock_t *rw) +static inline void __raw_read_unlock(__raw_rwlock_t *rw) { asm volatile(LOCK_PREFIX "incl %0" :"+m" (rw->lock) : : "memory"); } -static inline void __raw_write_unlock(raw_rwlock_t *rw) +static inline void __raw_write_unlock(__raw_rwlock_t *rw) { asm volatile(LOCK_PREFIX "addl %1, %0" : "+m" (rw->lock) : "i" (RW_LOCK_BIAS) : "memory"); } -#define _raw_spin_relax(lock) cpu_relax() -#define _raw_read_relax(lock) cpu_relax() -#define _raw_write_relax(lock) cpu_relax() +#define __raw_spin_relax(lock) cpu_relax() +#define __raw_read_relax(lock) cpu_relax() +#define __raw_write_relax(lock) cpu_relax() #endif /* _ASM_X86_SPINLOCK_H */ Index: linux-2.6-tip/arch/x86/include/asm/stackprotector.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/stackprotector.h @@ -0,0 +1,124 @@ +/* + * GCC stack protector support. + * + * Stack protector works by putting predefined pattern at the start of + * the stack frame and verifying that it hasn't been overwritten when + * returning from the function. The pattern is called stack canary + * and unfortunately gcc requires it to be at a fixed offset from %gs. + * On x86_64, the offset is 40 bytes and on x86_32 20 bytes. x86_64 + * and x86_32 use segment registers differently and thus handles this + * requirement differently. + * + * On x86_64, %gs is shared by percpu area and stack canary. All + * percpu symbols are zero based and %gs points to the base of percpu + * area. The first occupant of the percpu area is always + * irq_stack_union which contains stack_canary at offset 40. Userland + * %gs is always saved and restored on kernel entry and exit using + * swapgs, so stack protector doesn't add any complexity there. + * + * On x86_32, it's slightly more complicated. As in x86_64, %gs is + * used for userland TLS. Unfortunately, some processors are much + * slower at loading segment registers with different value when + * entering and leaving the kernel, so the kernel uses %fs for percpu + * area and manages %gs lazily so that %gs is switched only when + * necessary, usually during task switch. + * + * As gcc requires the stack canary at %gs:20, %gs can't be managed + * lazily if stack protector is enabled, so the kernel saves and + * restores userland %gs on kernel entry and exit. This behavior is + * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in + * system.h to hide the details. + */ + +#ifndef _ASM_STACKPROTECTOR_H +#define _ASM_STACKPROTECTOR_H 1 + +#ifdef CONFIG_CC_STACKPROTECTOR + +#include +#include +#include +#include +#include +#include + +/* + * 24 byte read-only segment initializer for stack canary. Linker + * can't handle the address bit shifting. Address will be set in + * head_32 for boot CPU and setup_per_cpu_areas() for others. + */ +#define GDT_STACK_CANARY_INIT \ + [GDT_ENTRY_STACK_CANARY] = { { { 0x00000018, 0x00409000 } } }, + +/* + * Initialize the stackprotector canary value. + * + * NOTE: this must only be called from functions that never return, + * and it must always be inlined. + */ +static __always_inline void boot_init_stack_canary(void) +{ + u64 canary; + u64 tsc; + +#ifdef CONFIG_X86_64 + BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40); +#endif + /* + * We both use the random pool and the current TSC as a source + * of randomness. The TSC only matters for very early init, + * there it already has some randomness on most systems. Later + * on during the bootup the random pool has true entropy too. + */ + get_random_bytes(&canary, sizeof(canary)); + tsc = __native_read_tsc(); + canary += tsc + (tsc << 32UL); + + current->stack_canary = canary; +#ifdef CONFIG_X86_64 + percpu_write(irq_stack_union.stack_canary, canary); +#else + percpu_write(stack_canary, canary); +#endif +} + +static inline void setup_stack_canary_segment(int cpu) +{ +#ifdef CONFIG_X86_32 + unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu) - 20; + struct desc_struct *gdt_table = get_cpu_gdt_table(cpu); + struct desc_struct desc; + + desc = gdt_table[GDT_ENTRY_STACK_CANARY]; + desc.base0 = canary & 0xffff; + desc.base1 = (canary >> 16) & 0xff; + desc.base2 = (canary >> 24) & 0xff; + write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S); +#endif +} + +static inline void load_stack_canary_segment(void) +{ +#ifdef CONFIG_X86_32 + asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory"); +#endif +} + +#else /* CC_STACKPROTECTOR */ + +#define GDT_STACK_CANARY_INIT + +/* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */ + +static inline void setup_stack_canary_segment(int cpu) +{ } + +static inline void load_stack_canary_segment(void) +{ +#ifdef CONFIG_X86_32 + asm volatile ("mov %0, %%gs" : : "r" (0)); +#endif +} + +#endif /* CC_STACKPROTECTOR */ +#endif /* _ASM_STACKPROTECTOR_H */ Index: linux-2.6-tip/arch/x86/include/asm/string_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/string_32.h +++ linux-2.6-tip/arch/x86/include/asm/string_32.h @@ -177,10 +177,18 @@ static inline void *__memcpy3d(void *to, * No 3D Now! */ +#ifndef CONFIG_KMEMCHECK #define memcpy(t, f, n) \ (__builtin_constant_p((n)) \ ? __constant_memcpy((t), (f), (n)) \ : __memcpy((t), (f), (n))) +#else +/* + * kmemcheck becomes very happy if we use the REP instructions unconditionally, + * because it means that we know both memory operands in advance. + */ +#define memcpy(t, f, n) __memcpy((t), (f), (n)) +#endif #endif Index: linux-2.6-tip/arch/x86/include/asm/string_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/string_64.h +++ linux-2.6-tip/arch/x86/include/asm/string_64.h @@ -27,6 +27,7 @@ static __always_inline void *__inline_me function. */ #define __HAVE_ARCH_MEMCPY 1 +#ifndef CONFIG_KMEMCHECK #if (__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4 extern void *memcpy(void *to, const void *from, size_t len); #else @@ -42,6 +43,13 @@ extern void *__memcpy(void *to, const vo __ret; \ }) #endif +#else +/* + * kmemcheck becomes very happy if we use the REP instructions unconditionally, + * because it means that we know both memory operands in advance. + */ +#define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len)) +#endif #define __HAVE_ARCH_MEMSET void *memset(void *s, int c, size_t n); Index: linux-2.6-tip/arch/x86/include/asm/summit/apic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/summit/apic.h +++ /dev/null @@ -1,202 +0,0 @@ -#ifndef __ASM_SUMMIT_APIC_H -#define __ASM_SUMMIT_APIC_H - -#include -#include - -#define esr_disable (1) -#define NO_BALANCE_IRQ (0) - -/* In clustered mode, the high nibble of APIC ID is a cluster number. - * The low nibble is a 4-bit bitmap. */ -#define XAPIC_DEST_CPUS_SHIFT 4 -#define XAPIC_DEST_CPUS_MASK ((1u << XAPIC_DEST_CPUS_SHIFT) - 1) -#define XAPIC_DEST_CLUSTER_MASK (XAPIC_DEST_CPUS_MASK << XAPIC_DEST_CPUS_SHIFT) - -#define APIC_DFR_VALUE (APIC_DFR_CLUSTER) - -static inline const cpumask_t *target_cpus(void) -{ - /* CPU_MASK_ALL (0xff) has undefined behaviour with - * dest_LowestPrio mode logical clustered apic interrupt routing - * Just start on cpu 0. IRQ balancing will spread load - */ - return &cpumask_of_cpu(0); -} - -#define INT_DELIVERY_MODE (dest_LowestPrio) -#define INT_DEST_MODE 1 /* logical delivery broadcast to all procs */ - -static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid) -{ - return 0; -} - -/* we don't use the phys_cpu_present_map to indicate apicid presence */ -static inline unsigned long check_apicid_present(int bit) -{ - return 1; -} - -#define apicid_cluster(apicid) ((apicid) & XAPIC_DEST_CLUSTER_MASK) - -extern u8 cpu_2_logical_apicid[]; - -static inline void init_apic_ldr(void) -{ - unsigned long val, id; - int count = 0; - u8 my_id = (u8)hard_smp_processor_id(); - u8 my_cluster = (u8)apicid_cluster(my_id); -#ifdef CONFIG_SMP - u8 lid; - int i; - - /* Create logical APIC IDs by counting CPUs already in cluster. */ - for (count = 0, i = nr_cpu_ids; --i >= 0; ) { - lid = cpu_2_logical_apicid[i]; - if (lid != BAD_APICID && apicid_cluster(lid) == my_cluster) - ++count; - } -#endif - /* We only have a 4 wide bitmap in cluster mode. If a deranged - * BIOS puts 5 CPUs in one APIC cluster, we're hosed. */ - BUG_ON(count >= XAPIC_DEST_CPUS_SHIFT); - id = my_cluster | (1UL << count); - apic_write(APIC_DFR, APIC_DFR_VALUE); - val = apic_read(APIC_LDR) & ~APIC_LDR_MASK; - val |= SET_APIC_LOGICAL_ID(id); - apic_write(APIC_LDR, val); -} - -static inline int multi_timer_check(int apic, int irq) -{ - return 0; -} - -static inline int apic_id_registered(void) -{ - return 1; -} - -static inline void setup_apic_routing(void) -{ - printk("Enabling APIC mode: Summit. Using %d I/O APICs\n", - nr_ioapics); -} - -static inline int apicid_to_node(int logical_apicid) -{ -#ifdef CONFIG_SMP - return apicid_2_node[hard_smp_processor_id()]; -#else - return 0; -#endif -} - -/* Mapping from cpu number to logical apicid */ -static inline int cpu_to_logical_apicid(int cpu) -{ -#ifdef CONFIG_SMP - if (cpu >= nr_cpu_ids) - return BAD_APICID; - return (int)cpu_2_logical_apicid[cpu]; -#else - return logical_smp_processor_id(); -#endif -} - -static inline int cpu_present_to_apicid(int mps_cpu) -{ - if (mps_cpu < nr_cpu_ids) - return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu); - else - return BAD_APICID; -} - -static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_id_map) -{ - /* For clustered we don't have a good way to do this yet - hack */ - return physids_promote(0x0F); -} - -static inline physid_mask_t apicid_to_cpu_present(int apicid) -{ - return physid_mask_of_physid(0); -} - -static inline void setup_portio_remap(void) -{ -} - -static inline int check_phys_apicid_present(int boot_cpu_physical_apicid) -{ - return 1; -} - -static inline void enable_apic_mode(void) -{ -} - -static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask) -{ - int num_bits_set; - int cpus_found = 0; - int cpu; - int apicid; - - num_bits_set = cpus_weight(*cpumask); - /* Return id to all */ - if (num_bits_set >= nr_cpu_ids) - return (int) 0xFF; - /* - * The cpus in the mask must all be on the apic cluster. If are not - * on the same apicid cluster return default value of TARGET_CPUS. - */ - cpu = first_cpu(*cpumask); - apicid = cpu_to_logical_apicid(cpu); - while (cpus_found < num_bits_set) { - if (cpu_isset(cpu, *cpumask)) { - int new_apicid = cpu_to_logical_apicid(cpu); - if (apicid_cluster(apicid) != - apicid_cluster(new_apicid)){ - printk ("%s: Not a valid mask!\n", __func__); - return 0xFF; - } - apicid = apicid | new_apicid; - cpus_found++; - } - cpu++; - } - return apicid; -} - -static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask, - const struct cpumask *andmask) -{ - int apicid = cpu_to_logical_apicid(0); - cpumask_var_t cpumask; - - if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC)) - return apicid; - - cpumask_and(cpumask, inmask, andmask); - cpumask_and(cpumask, cpumask, cpu_online_mask); - apicid = cpu_mask_to_apicid(cpumask); - - free_cpumask_var(cpumask); - return apicid; -} - -/* cpuid returns the value latched in the HW at reset, not the APIC ID - * register's value. For any box whose BIOS changes APIC IDs, like - * clustered APIC systems, we must use hard_smp_processor_id. - * - * See Intel's IA-32 SW Dev's Manual Vol2 under CPUID. - */ -static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb) -{ - return hard_smp_processor_id() >> index_msb; -} - -#endif /* __ASM_SUMMIT_APIC_H */ Index: linux-2.6-tip/arch/x86/include/asm/summit/apicdef.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/summit/apicdef.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SUMMIT_APICDEF_H -#define __ASM_SUMMIT_APICDEF_H - -#define APIC_ID_MASK (0xFF<<24) - -static inline unsigned get_apic_id(unsigned long x) -{ - return (x>>24)&0xFF; -} - -#define GET_APIC_ID(x) get_apic_id(x) - -#endif Index: linux-2.6-tip/arch/x86/include/asm/summit/ipi.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/summit/ipi.h +++ /dev/null @@ -1,26 +0,0 @@ -#ifndef __ASM_SUMMIT_IPI_H -#define __ASM_SUMMIT_IPI_H - -void send_IPI_mask_sequence(const cpumask_t *mask, int vector); -void send_IPI_mask_allbutself(const cpumask_t *mask, int vector); - -static inline void send_IPI_mask(const cpumask_t *mask, int vector) -{ - send_IPI_mask_sequence(mask, vector); -} - -static inline void send_IPI_allbutself(int vector) -{ - cpumask_t mask = cpu_online_map; - cpu_clear(smp_processor_id(), mask); - - if (!cpus_empty(mask)) - send_IPI_mask(&mask, vector); -} - -static inline void send_IPI_all(int vector) -{ - send_IPI_mask(&cpu_online_map, vector); -} - -#endif /* __ASM_SUMMIT_IPI_H */ Index: linux-2.6-tip/arch/x86/include/asm/summit/mpparse.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/summit/mpparse.h +++ /dev/null @@ -1,109 +0,0 @@ -#ifndef __ASM_SUMMIT_MPPARSE_H -#define __ASM_SUMMIT_MPPARSE_H - -#include - -extern int use_cyclone; - -#ifdef CONFIG_X86_SUMMIT_NUMA -extern void setup_summit(void); -#else -#define setup_summit() {} -#endif - -static inline int mps_oem_check(struct mpc_table *mpc, char *oem, - char *productid) -{ - if (!strncmp(oem, "IBM ENSW", 8) && - (!strncmp(productid, "VIGIL SMP", 9) - || !strncmp(productid, "EXA", 3) - || !strncmp(productid, "RUTHLESS SMP", 12))){ - mark_tsc_unstable("Summit based system"); - use_cyclone = 1; /*enable cyclone-timer*/ - setup_summit(); - return 1; - } - return 0; -} - -/* Hook from generic ACPI tables.c */ -static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id) -{ - if (!strncmp(oem_id, "IBM", 3) && - (!strncmp(oem_table_id, "SERVIGIL", 8) - || !strncmp(oem_table_id, "EXA", 3))){ - mark_tsc_unstable("Summit based system"); - use_cyclone = 1; /*enable cyclone-timer*/ - setup_summit(); - return 1; - } - return 0; -} - -struct rio_table_hdr { - unsigned char version; /* Version number of this data structure */ - /* Version 3 adds chassis_num & WP_index */ - unsigned char num_scal_dev; /* # of Scalability devices (Twisters for Vigil) */ - unsigned char num_rio_dev; /* # of RIO I/O devices (Cyclones and Winnipegs) */ -} __attribute__((packed)); - -struct scal_detail { - unsigned char node_id; /* Scalability Node ID */ - unsigned long CBAR; /* Address of 1MB register space */ - unsigned char port0node; /* Node ID port connected to: 0xFF=None */ - unsigned char port0port; /* Port num port connected to: 0,1,2, or 0xFF=None */ - unsigned char port1node; /* Node ID port connected to: 0xFF = None */ - unsigned char port1port; /* Port num port connected to: 0,1,2, or 0xFF=None */ - unsigned char port2node; /* Node ID port connected to: 0xFF = None */ - unsigned char port2port; /* Port num port connected to: 0,1,2, or 0xFF=None */ - unsigned char chassis_num; /* 1 based Chassis number (1 = boot node) */ -} __attribute__((packed)); - -struct rio_detail { - unsigned char node_id; /* RIO Node ID */ - unsigned long BBAR; /* Address of 1MB register space */ - unsigned char type; /* Type of device */ - unsigned char owner_id; /* For WPEG: Node ID of Cyclone that owns this WPEG*/ - /* For CYC: Node ID of Twister that owns this CYC */ - unsigned char port0node; /* Node ID port connected to: 0xFF=None */ - unsigned char port0port; /* Port num port connected to: 0,1,2, or 0xFF=None */ - unsigned char port1node; /* Node ID port connected to: 0xFF=None */ - unsigned char port1port; /* Port num port connected to: 0,1,2, or 0xFF=None */ - unsigned char first_slot; /* For WPEG: Lowest slot number below this WPEG */ - /* For CYC: 0 */ - unsigned char status; /* For WPEG: Bit 0 = 1 : the XAPIC is used */ - /* = 0 : the XAPIC is not used, ie:*/ - /* ints fwded to another XAPIC */ - /* Bits1:7 Reserved */ - /* For CYC: Bits0:7 Reserved */ - unsigned char WP_index; /* For WPEG: WPEG instance index - lower ones have */ - /* lower slot numbers/PCI bus numbers */ - /* For CYC: No meaning */ - unsigned char chassis_num; /* 1 based Chassis number */ - /* For LookOut WPEGs this field indicates the */ - /* Expansion Chassis #, enumerated from Boot */ - /* Node WPEG external port, then Boot Node CYC */ - /* external port, then Next Vigil chassis WPEG */ - /* external port, etc. */ - /* Shared Lookouts have only 1 chassis number (the */ - /* first one assigned) */ -} __attribute__((packed)); - - -typedef enum { - CompatTwister = 0, /* Compatibility Twister */ - AltTwister = 1, /* Alternate Twister of internal 8-way */ - CompatCyclone = 2, /* Compatibility Cyclone */ - AltCyclone = 3, /* Alternate Cyclone of internal 8-way */ - CompatWPEG = 4, /* Compatibility WPEG */ - AltWPEG = 5, /* Second Planar WPEG */ - LookOutAWPEG = 6, /* LookOut WPEG */ - LookOutBWPEG = 7, /* LookOut WPEG */ -} node_type; - -static inline int is_WPEG(struct rio_detail *rio){ - return (rio->type == CompatWPEG || rio->type == AltWPEG || - rio->type == LookOutAWPEG || rio->type == LookOutBWPEG); -} - -#endif /* __ASM_SUMMIT_MPPARSE_H */ Index: linux-2.6-tip/arch/x86/include/asm/syscalls.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/syscalls.h +++ linux-2.6-tip/arch/x86/include/asm/syscalls.h @@ -29,21 +29,21 @@ asmlinkage int sys_get_thread_area(struc /* X86_32 only */ #ifdef CONFIG_X86_32 /* kernel/process_32.c */ -asmlinkage int sys_fork(struct pt_regs); -asmlinkage int sys_clone(struct pt_regs); -asmlinkage int sys_vfork(struct pt_regs); -asmlinkage int sys_execve(struct pt_regs); +int sys_fork(struct pt_regs *); +int sys_clone(struct pt_regs *); +int sys_vfork(struct pt_regs *); +int sys_execve(struct pt_regs *); /* kernel/signal_32.c */ asmlinkage int sys_sigsuspend(int, int, old_sigset_t); asmlinkage int sys_sigaction(int, const struct old_sigaction __user *, struct old_sigaction __user *); -asmlinkage int sys_sigaltstack(unsigned long); -asmlinkage unsigned long sys_sigreturn(unsigned long); -asmlinkage int sys_rt_sigreturn(unsigned long); +int sys_sigaltstack(struct pt_regs *); +unsigned long sys_sigreturn(struct pt_regs *); +long sys_rt_sigreturn(struct pt_regs *); /* kernel/ioport.c */ -asmlinkage long sys_iopl(unsigned long); +long sys_iopl(struct pt_regs *); /* kernel/sys_i386_32.c */ asmlinkage long sys_mmap2(unsigned long, unsigned long, unsigned long, @@ -59,8 +59,8 @@ struct oldold_utsname; asmlinkage int sys_olduname(struct oldold_utsname __user *); /* kernel/vm86_32.c */ -asmlinkage int sys_vm86old(struct pt_regs); -asmlinkage int sys_vm86(struct pt_regs); +int sys_vm86old(struct pt_regs *); +int sys_vm86(struct pt_regs *); #else /* CONFIG_X86_32 */ @@ -74,6 +74,7 @@ asmlinkage long sys_vfork(struct pt_regs asmlinkage long sys_execve(char __user *, char __user * __user *, char __user * __user *, struct pt_regs *); +long sys_arch_prctl(int, unsigned long); /* kernel/ioport.c */ asmlinkage long sys_iopl(unsigned int, struct pt_regs *); @@ -81,7 +82,7 @@ asmlinkage long sys_iopl(unsigned int, s /* kernel/signal_64.c */ asmlinkage long sys_sigaltstack(const stack_t __user *, stack_t __user *, struct pt_regs *); -asmlinkage long sys_rt_sigreturn(struct pt_regs *); +long sys_rt_sigreturn(struct pt_regs *); /* kernel/sys_x86_64.c */ asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long, Index: linux-2.6-tip/arch/x86/include/asm/system.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/system.h +++ linux-2.6-tip/arch/x86/include/asm/system.h @@ -20,9 +20,26 @@ struct task_struct; /* one of the stranger aspects of C forward declarations */ struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next); +struct tss_struct; +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, + struct tss_struct *tss); #ifdef CONFIG_X86_32 +#ifdef CONFIG_CC_STACKPROTECTOR +#define __switch_canary \ + "movl %P[task_canary](%[next]), %%ebx\n\t" \ + "movl %%ebx, "__percpu_arg([stack_canary])"\n\t" +#define __switch_canary_oparam \ + , [stack_canary] "=m" (per_cpu_var(stack_canary)) +#define __switch_canary_iparam \ + , [task_canary] "i" (offsetof(struct task_struct, stack_canary)) +#else /* CC_STACKPROTECTOR */ +#define __switch_canary +#define __switch_canary_oparam +#define __switch_canary_iparam +#endif /* CC_STACKPROTECTOR */ + /* * Saving eflags is important. It switches not only IOPL between tasks, * it also protects other tasks from NT leaking through sysenter etc. @@ -44,6 +61,7 @@ do { \ "movl %[next_sp],%%esp\n\t" /* restore ESP */ \ "movl $1f,%[prev_ip]\n\t" /* save EIP */ \ "pushl %[next_ip]\n\t" /* restore EIP */ \ + __switch_canary \ "jmp __switch_to\n" /* regparm call */ \ "1:\t" \ "popl %%ebp\n\t" /* restore EBP */ \ @@ -58,6 +76,8 @@ do { \ "=b" (ebx), "=c" (ecx), "=d" (edx), \ "=S" (esi), "=D" (edi) \ \ + __switch_canary_oparam \ + \ /* input parameters: */ \ : [next_sp] "m" (next->thread.sp), \ [next_ip] "m" (next->thread.ip), \ @@ -66,6 +86,8 @@ do { \ [prev] "a" (prev), \ [next] "d" (next) \ \ + __switch_canary_iparam \ + \ : /* reloaded segment registers */ \ "memory"); \ } while (0) @@ -86,27 +108,44 @@ do { \ , "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \ "r12", "r13", "r14", "r15" +#ifdef CONFIG_CC_STACKPROTECTOR +#define __switch_canary \ + "movq %P[task_canary](%%rsi),%%r8\n\t" \ + "movq %%r8,"__percpu_arg([gs_canary])"\n\t" +#define __switch_canary_oparam \ + , [gs_canary] "=m" (per_cpu_var(irq_stack_union.stack_canary)) +#define __switch_canary_iparam \ + , [task_canary] "i" (offsetof(struct task_struct, stack_canary)) +#else /* CC_STACKPROTECTOR */ +#define __switch_canary +#define __switch_canary_oparam +#define __switch_canary_iparam +#endif /* CC_STACKPROTECTOR */ + /* Save restore flags to clear handle leaking NT */ #define switch_to(prev, next, last) \ - asm volatile(SAVE_CONTEXT \ + asm volatile(SAVE_CONTEXT \ "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \ "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */ \ "call __switch_to\n\t" \ ".globl thread_return\n" \ "thread_return:\n\t" \ - "movq %%gs:%P[pda_pcurrent],%%rsi\n\t" \ + "movq "__percpu_arg([current_task])",%%rsi\n\t" \ + __switch_canary \ "movq %P[thread_info](%%rsi),%%r8\n\t" \ - LOCK_PREFIX "btr %[tif_fork],%P[ti_flags](%%r8)\n\t" \ "movq %%rax,%%rdi\n\t" \ - "jc ret_from_fork\n\t" \ + "testl %[_tif_fork],%P[ti_flags](%%r8)\n\t" \ + "jnz ret_from_fork\n\t" \ RESTORE_CONTEXT \ : "=a" (last) \ + __switch_canary_oparam \ : [next] "S" (next), [prev] "D" (prev), \ [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \ [ti_flags] "i" (offsetof(struct thread_info, flags)), \ - [tif_fork] "i" (TIF_FORK), \ + [_tif_fork] "i" (_TIF_FORK), \ [thread_info] "i" (offsetof(struct task_struct, stack)), \ - [pda_pcurrent] "i" (offsetof(struct x8664_pda, pcurrent)) \ + [current_task] "m" (per_cpu_var(current_task)) \ + __switch_canary_iparam \ : "memory", "cc" __EXTRA_CLOBBER) #endif @@ -165,6 +204,25 @@ extern void native_load_gs_index(unsigne #define savesegment(seg, value) \ asm("mov %%" #seg ",%0":"=r" (value) : : "memory") +/* + * x86_32 user gs accessors. + */ +#ifdef CONFIG_X86_32 +#ifdef CONFIG_X86_32_LAZY_GS +#define get_user_gs(regs) (u16)({unsigned long v; savesegment(gs, v); v;}) +#define set_user_gs(regs, v) loadsegment(gs, (unsigned long)(v)) +#define task_user_gs(tsk) ((tsk)->thread.gs) +#define lazy_save_gs(v) savesegment(gs, (v)) +#define lazy_load_gs(v) loadsegment(gs, (v)) +#else /* X86_32_LAZY_GS */ +#define get_user_gs(regs) (u16)((regs)->gs) +#define set_user_gs(regs, v) do { (regs)->gs = (v); } while (0) +#define task_user_gs(tsk) (task_pt_regs(tsk)->gs) +#define lazy_save_gs(v) do { } while (0) +#define lazy_load_gs(v) do { } while (0) +#endif /* X86_32_LAZY_GS */ +#endif /* X86_32 */ + static inline unsigned long get_limit(unsigned long segment) { unsigned long __limit; Index: linux-2.6-tip/arch/x86/include/asm/thread_info.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/thread_info.h +++ linux-2.6-tip/arch/x86/include/asm/thread_info.h @@ -40,6 +40,7 @@ struct thread_info { */ __u8 supervisor_stack[0]; #endif + int uaccess_err; }; #define INIT_THREAD_INFO(tsk) \ @@ -82,6 +83,7 @@ struct thread_info { #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ #define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */ +#define TIF_PERF_COUNTERS 11 /* notify perf counter work */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* 32bit process */ #define TIF_FORK 18 /* ret_from_fork */ @@ -93,6 +95,7 @@ struct thread_info { #define TIF_FORCED_TF 24 /* true if TF in eflags artificially */ #define TIF_DEBUGCTLMSR 25 /* uses thread_struct.debugctlmsr */ #define TIF_DS_AREA_MSR 26 /* uses thread_struct.ds_area_msr */ +#define TIF_SYSCALL_FTRACE 27 /* for ftrace syscall instrumentation */ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) @@ -104,6 +107,7 @@ struct thread_info { #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_MCE_NOTIFY (1 << TIF_MCE_NOTIFY) +#define _TIF_PERF_COUNTERS (1 << TIF_PERF_COUNTERS) #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) #define _TIF_FORK (1 << TIF_FORK) @@ -114,15 +118,17 @@ struct thread_info { #define _TIF_FORCED_TF (1 << TIF_FORCED_TF) #define _TIF_DEBUGCTLMSR (1 << TIF_DEBUGCTLMSR) #define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR) +#define _TIF_SYSCALL_FTRACE (1 << TIF_SYSCALL_FTRACE) /* work to do in syscall_trace_enter() */ #define _TIF_WORK_SYSCALL_ENTRY \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | \ + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_FTRACE | \ _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP) /* work to do in syscall_trace_leave() */ #define _TIF_WORK_SYSCALL_EXIT \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP) + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \ + _TIF_SYSCALL_FTRACE) /* work to do on interrupt/exception return */ #define _TIF_WORK_MASK \ @@ -131,11 +137,11 @@ struct thread_info { _TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU)) /* work to do on any return to user space */ -#define _TIF_ALLWORK_MASK (0x0000FFFF & ~_TIF_SECCOMP) +#define _TIF_ALLWORK_MASK ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_FTRACE) /* Only used for 64 bit */ #define _TIF_DO_NOTIFY_MASK \ - (_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_NOTIFY_RESUME) + (_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_PERF_COUNTERS|_TIF_NOTIFY_RESUME) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ @@ -148,9 +154,9 @@ struct thread_info { /* thread information allocation */ #ifdef CONFIG_DEBUG_STACK_USAGE -#define THREAD_FLAGS (GFP_KERNEL | __GFP_ZERO) +#define THREAD_FLAGS (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) #else -#define THREAD_FLAGS GFP_KERNEL +#define THREAD_FLAGS (GFP_KERNEL | __GFP_NOTRACK) #endif #define __HAVE_ARCH_THREAD_INFO_ALLOCATOR @@ -194,25 +200,21 @@ static inline struct thread_info *curren #else /* X86_32 */ -#include +#include +#define KERNEL_STACK_OFFSET (5*8) /* * macros/functions for gaining access to the thread information structure * preempt_count needs to be 1 initially, until the scheduler is functional. */ #ifndef __ASSEMBLY__ -static inline struct thread_info *current_thread_info(void) -{ - struct thread_info *ti; - ti = (void *)(read_pda(kernelstack) + PDA_STACKOFFSET - THREAD_SIZE); - return ti; -} +DECLARE_PER_CPU(unsigned long, kernel_stack); -/* do not use in interrupt context */ -static inline struct thread_info *stack_thread_info(void) +static inline struct thread_info *current_thread_info(void) { struct thread_info *ti; - asm("andq %%rsp,%0; " : "=r" (ti) : "0" (~(THREAD_SIZE - 1))); + ti = (void *)(percpu_read(kernel_stack) + + KERNEL_STACK_OFFSET - THREAD_SIZE); return ti; } @@ -220,8 +222,8 @@ static inline struct thread_info *stack_ /* how to get the thread information struct from ASM */ #define GET_THREAD_INFO(reg) \ - movq %gs:pda_kernelstack,reg ; \ - subq $(THREAD_SIZE-PDA_STACKOFFSET),reg + movq PER_CPU_VAR(kernel_stack),reg ; \ + subq $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg #endif Index: linux-2.6-tip/arch/x86/include/asm/timer.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/timer.h +++ linux-2.6-tip/arch/x86/include/asm/timer.h @@ -3,6 +3,7 @@ #include #include #include +#include #define TICK_SIZE (tick_nsec / 1000) @@ -12,6 +13,7 @@ unsigned long native_calibrate_tsc(void) #ifdef CONFIG_X86_32 extern int timer_ack; extern int recalibrate_cpu_khz(void); +extern irqreturn_t timer_interrupt(int irq, void *dev_id); #endif /* CONFIG_X86_32 */ extern int no_timer_check; @@ -56,9 +58,9 @@ static inline unsigned long long cycles_ unsigned long long ns; unsigned long flags; - local_irq_save(flags); + raw_local_irq_save(flags); ns = __cycles_2_ns(cyc); - local_irq_restore(flags); + raw_local_irq_restore(flags); return ns; } Index: linux-2.6-tip/arch/x86/include/asm/tlbflush.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/tlbflush.h +++ linux-2.6-tip/arch/x86/include/asm/tlbflush.h @@ -7,6 +7,21 @@ #include #include +/* + * TLB-flush needs to be nonpreemptible on PREEMPT_RT due to the + * following complex race scenario: + * + * if the current task is lazy-TLB and does a TLB flush and + * gets preempted after the movl %%r3, %0 but before the + * movl %0, %%cr3 then its ->active_mm might change and it will + * install the wrong cr3 when it switches back. This is not a + * problem for the lazy-TLB task itself, but if the next task it + * switches to has an ->mm that is also the lazy-TLB task's + * new ->active_mm, then the scheduler will assume that cr3 is + * the new one, while we overwrote it with the old one. The result + * is the wrong cr3 in the new (non-lazy-TLB) task, which typically + * causes an infinite pagefault upon the next userspace access. + */ #ifdef CONFIG_PARAVIRT #include #else @@ -17,7 +32,9 @@ static inline void __native_flush_tlb(void) { + preempt_disable(); write_cr3(read_cr3()); + preempt_enable(); } static inline void __native_flush_tlb_global(void) @@ -95,6 +112,13 @@ static inline void __flush_tlb_one(unsig static inline void flush_tlb_mm(struct mm_struct *mm) { + /* + * This is safe on PREEMPT_RT because if we preempt + * right after the check but before the __flush_tlb(), + * and if ->active_mm changes, then we might miss a + * TLB flush, but that TLB flush happened already when + * ->active_mm was changed: + */ if (mm == current->active_mm) __flush_tlb(); } @@ -113,7 +137,7 @@ static inline void flush_tlb_range(struc __flush_tlb(); } -static inline void native_flush_tlb_others(const cpumask_t *cpumask, +static inline void native_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, unsigned long va) { @@ -142,31 +166,28 @@ static inline void flush_tlb_range(struc flush_tlb_mm(vma->vm_mm); } -void native_flush_tlb_others(const cpumask_t *cpumask, struct mm_struct *mm, - unsigned long va); +void native_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, unsigned long va); #define TLBSTATE_OK 1 #define TLBSTATE_LAZY 2 -#ifdef CONFIG_X86_32 struct tlb_state { struct mm_struct *active_mm; int state; - char __cacheline_padding[L1_CACHE_BYTES-8]; }; DECLARE_PER_CPU(struct tlb_state, cpu_tlbstate); -void reset_lazy_tlbstate(void); -#else static inline void reset_lazy_tlbstate(void) { + percpu_write(cpu_tlbstate.state, 0); + percpu_write(cpu_tlbstate.active_mm, &init_mm); } -#endif #endif /* SMP */ #ifndef CONFIG_PARAVIRT -#define flush_tlb_others(mask, mm, va) native_flush_tlb_others(&mask, mm, va) +#define flush_tlb_others(mask, mm, va) native_flush_tlb_others(mask, mm, va) #endif static inline void flush_tlb_kernel_range(unsigned long start, @@ -175,4 +196,6 @@ static inline void flush_tlb_kernel_rang flush_tlb_all(); } +extern void zap_low_mappings(void); + #endif /* _ASM_X86_TLBFLUSH_H */ Index: linux-2.6-tip/arch/x86/include/asm/topology.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/topology.h +++ linux-2.6-tip/arch/x86/include/asm/topology.h @@ -44,9 +44,6 @@ #ifdef CONFIG_X86_32 -/* Mappings between node number and cpus on that node. */ -extern cpumask_t node_to_cpumask_map[]; - /* Mappings between logical cpu number and node number */ extern int cpu_to_node_map[]; @@ -57,39 +54,18 @@ static inline int cpu_to_node(int cpu) } #define early_cpu_to_node(cpu) cpu_to_node(cpu) -/* Returns a bitmask of CPUs on Node 'node'. - * - * Side note: this function creates the returned cpumask on the stack - * so with a high NR_CPUS count, excessive stack space is used. The - * cpumask_of_node function should be used whenever possible. - */ -static inline cpumask_t node_to_cpumask(int node) -{ - return node_to_cpumask_map[node]; -} - -/* Returns a bitmask of CPUs on Node 'node'. */ -static inline const struct cpumask *cpumask_of_node(int node) -{ - return &node_to_cpumask_map[node]; -} - #else /* CONFIG_X86_64 */ -/* Mappings between node number and cpus on that node. */ -extern cpumask_t *node_to_cpumask_map; - /* Mappings between logical cpu number and node number */ DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map); /* Returns the number of the current Node. */ -#define numa_node_id() read_pda(nodenumber) +DECLARE_PER_CPU(int, node_number); +#define numa_node_id() percpu_read(node_number) #ifdef CONFIG_DEBUG_PER_CPU_MAPS extern int cpu_to_node(int cpu); extern int early_cpu_to_node(int cpu); -extern const cpumask_t *cpumask_of_node(int node); -extern cpumask_t node_to_cpumask(int node); #else /* !CONFIG_DEBUG_PER_CPU_MAPS */ @@ -102,37 +78,27 @@ static inline int cpu_to_node(int cpu) /* Same function but used if called before per_cpu areas are setup */ static inline int early_cpu_to_node(int cpu) { - if (early_per_cpu_ptr(x86_cpu_to_node_map)) - return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu]; - - return per_cpu(x86_cpu_to_node_map, cpu); + return early_per_cpu(x86_cpu_to_node_map, cpu); } -/* Returns a pointer to the cpumask of CPUs on Node 'node'. */ -static inline const cpumask_t *cpumask_of_node(int node) -{ - return &node_to_cpumask_map[node]; -} +#endif /* !CONFIG_DEBUG_PER_CPU_MAPS */ -/* Returns a bitmask of CPUs on Node 'node'. */ -static inline cpumask_t node_to_cpumask(int node) +#endif /* CONFIG_X86_64 */ + +/* Mappings between node number and cpus on that node. */ +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; + +#ifdef CONFIG_DEBUG_PER_CPU_MAPS +extern const struct cpumask *cpumask_of_node(int node); +#else +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */ +static inline const struct cpumask *cpumask_of_node(int node) { return node_to_cpumask_map[node]; } +#endif -#endif /* !CONFIG_DEBUG_PER_CPU_MAPS */ - -/* - * Replace default node_to_cpumask_ptr with optimized version - * Deprecated: use "const struct cpumask *mask = cpumask_of_node(node)" - */ -#define node_to_cpumask_ptr(v, node) \ - const cpumask_t *v = cpumask_of_node(node) - -#define node_to_cpumask_ptr_next(v, node) \ - v = cpumask_of_node(node) - -#endif /* CONFIG_X86_64 */ +extern void setup_node_to_cpumask_map(void); /* * Returns the number of the node containing Node 'node'. This @@ -141,7 +107,6 @@ static inline cpumask_t node_to_cpumask( #define parent_node(node) (node) #define pcibus_to_node(bus) __pcibus_to_node(bus) -#define pcibus_to_cpumask(bus) __pcibus_to_cpumask(bus) #ifdef CONFIG_X86_32 extern unsigned long node_start_pfn[]; @@ -192,32 +157,32 @@ extern int __node_distance(int, int); #else /* !CONFIG_NUMA */ -#define numa_node_id() 0 -#define cpu_to_node(cpu) 0 -#define early_cpu_to_node(cpu) 0 +static inline int numa_node_id(void) +{ + return 0; +} + +static inline int cpu_to_node(int cpu) +{ + return 0; +} -static inline const cpumask_t *cpumask_of_node(int node) +static inline int early_cpu_to_node(int cpu) { - return &cpu_online_map; + return 0; } -static inline cpumask_t node_to_cpumask(int node) + +static inline const struct cpumask *cpumask_of_node(int node) { - return cpu_online_map; + return cpu_online_mask; } static inline int node_to_first_cpu(int node) { - return first_cpu(cpu_online_map); + return cpumask_first(cpu_online_mask); } -/* - * Replace default node_to_cpumask_ptr with optimized version - * Deprecated: use "const struct cpumask *mask = cpumask_of_node(node)" - */ -#define node_to_cpumask_ptr(v, node) \ - const cpumask_t *v = cpumask_of_node(node) +static inline void setup_node_to_cpumask_map(void) { } -#define node_to_cpumask_ptr_next(v, node) \ - v = cpumask_of_node(node) #endif #include @@ -230,16 +195,13 @@ static inline int node_to_first_cpu(int } #endif -extern cpumask_t cpu_coregroup_map(int cpu); extern const struct cpumask *cpu_coregroup_mask(int cpu); #ifdef ENABLE_TOPO_DEFINES #define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id) #define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id) -#define topology_core_siblings(cpu) (per_cpu(cpu_core_map, cpu)) -#define topology_thread_siblings(cpu) (per_cpu(cpu_sibling_map, cpu)) -#define topology_core_cpumask(cpu) (&per_cpu(cpu_core_map, cpu)) -#define topology_thread_cpumask(cpu) (&per_cpu(cpu_sibling_map, cpu)) +#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu)) +#define topology_thread_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu)) /* indicates that pointers to the topology cpumask_t maps are valid */ #define arch_provides_topology_pointers yes @@ -253,7 +215,7 @@ struct pci_bus; void set_pci_bus_resources_arch_default(struct pci_bus *b); #ifdef CONFIG_SMP -#define mc_capable() (cpus_weight(per_cpu(cpu_core_map, 0)) != nr_cpu_ids) +#define mc_capable() (cpumask_weight(cpu_core_mask(0)) != nr_cpu_ids) #define smt_capable() (smp_num_siblings > 1) #endif Index: linux-2.6-tip/arch/x86/include/asm/trampoline.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/trampoline.h +++ linux-2.6-tip/arch/x86/include/asm/trampoline.h @@ -13,6 +13,7 @@ extern unsigned char *trampoline_base; extern unsigned long init_rsp; extern unsigned long initial_code; +extern unsigned long initial_gs; #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE) #define TRAMPOLINE_BASE 0x6000 Index: linux-2.6-tip/arch/x86/include/asm/traps.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/traps.h +++ linux-2.6-tip/arch/x86/include/asm/traps.h @@ -41,7 +41,7 @@ dotraplinkage void do_int3(struct pt_reg dotraplinkage void do_overflow(struct pt_regs *, long); dotraplinkage void do_bounds(struct pt_regs *, long); dotraplinkage void do_invalid_op(struct pt_regs *, long); -dotraplinkage void do_device_not_available(struct pt_regs); +dotraplinkage void do_device_not_available(struct pt_regs *, long); dotraplinkage void do_coprocessor_segment_overrun(struct pt_regs *, long); dotraplinkage void do_invalid_TSS(struct pt_regs *, long); dotraplinkage void do_segment_not_present(struct pt_regs *, long); Index: linux-2.6-tip/arch/x86/include/asm/uaccess.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/uaccess.h +++ linux-2.6-tip/arch/x86/include/asm/uaccess.h @@ -121,7 +121,7 @@ extern int __get_user_bad(void); #define __get_user_x(size, ret, x, ptr) \ asm volatile("call __get_user_" #size \ - : "=a" (ret),"=d" (x) \ + : "=a" (ret), "=d" (x) \ : "0" (ptr)) \ /* Careful: we have to cast the result to the type of the pointer @@ -181,12 +181,12 @@ extern int __get_user_bad(void); #define __put_user_x(size, x, ptr, __ret_pu) \ asm volatile("call __put_user_" #size : "=a" (__ret_pu) \ - :"0" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx") + : "0" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx") #ifdef CONFIG_X86_32 -#define __put_user_u64(x, addr, err) \ +#define __put_user_asm_u64(x, addr, err, errret) \ asm volatile("1: movl %%eax,0(%2)\n" \ "2: movl %%edx,4(%2)\n" \ "3:\n" \ @@ -197,14 +197,24 @@ extern int __get_user_bad(void); _ASM_EXTABLE(1b, 4b) \ _ASM_EXTABLE(2b, 4b) \ : "=r" (err) \ - : "A" (x), "r" (addr), "i" (-EFAULT), "0" (err)) + : "A" (x), "r" (addr), "i" (errret), "0" (err)) + +#define __put_user_asm_ex_u64(x, addr) \ + asm volatile("1: movl %%eax,0(%1)\n" \ + "2: movl %%edx,4(%1)\n" \ + "3:\n" \ + _ASM_EXTABLE(1b, 2b - 1b) \ + _ASM_EXTABLE(2b, 3b - 2b) \ + : : "A" (x), "r" (addr)) #define __put_user_x8(x, ptr, __ret_pu) \ asm volatile("call __put_user_8" : "=a" (__ret_pu) \ : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx") #else -#define __put_user_u64(x, ptr, retval) \ - __put_user_asm(x, ptr, retval, "q", "", "Zr", -EFAULT) +#define __put_user_asm_u64(x, ptr, retval, errret) \ + __put_user_asm(x, ptr, retval, "q", "", "Zr", errret) +#define __put_user_asm_ex_u64(x, addr) \ + __put_user_asm_ex(x, addr, "q", "", "Zr") #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu) #endif @@ -276,10 +286,32 @@ do { \ __put_user_asm(x, ptr, retval, "w", "w", "ir", errret); \ break; \ case 4: \ - __put_user_asm(x, ptr, retval, "l", "k", "ir", errret);\ + __put_user_asm(x, ptr, retval, "l", "k", "ir", errret); \ break; \ case 8: \ - __put_user_u64((__typeof__(*ptr))(x), ptr, retval); \ + __put_user_asm_u64((__typeof__(*ptr))(x), ptr, retval, \ + errret); \ + break; \ + default: \ + __put_user_bad(); \ + } \ +} while (0) + +#define __put_user_size_ex(x, ptr, size) \ +do { \ + __chk_user_ptr(ptr); \ + switch (size) { \ + case 1: \ + __put_user_asm_ex(x, ptr, "b", "b", "iq"); \ + break; \ + case 2: \ + __put_user_asm_ex(x, ptr, "w", "w", "ir"); \ + break; \ + case 4: \ + __put_user_asm_ex(x, ptr, "l", "k", "ir"); \ + break; \ + case 8: \ + __put_user_asm_ex_u64((__typeof__(*ptr))(x), ptr); \ break; \ default: \ __put_user_bad(); \ @@ -311,9 +343,12 @@ do { \ #ifdef CONFIG_X86_32 #define __get_user_asm_u64(x, ptr, retval, errret) (x) = __get_user_bad() +#define __get_user_asm_ex_u64(x, ptr) (x) = __get_user_bad() #else #define __get_user_asm_u64(x, ptr, retval, errret) \ __get_user_asm(x, ptr, retval, "q", "", "=r", errret) +#define __get_user_asm_ex_u64(x, ptr) \ + __get_user_asm_ex(x, ptr, "q", "", "=r") #endif #define __get_user_size(x, ptr, size, retval, errret) \ @@ -350,6 +385,33 @@ do { \ : "=r" (err), ltype(x) \ : "m" (__m(addr)), "i" (errret), "0" (err)) +#define __get_user_size_ex(x, ptr, size) \ +do { \ + __chk_user_ptr(ptr); \ + switch (size) { \ + case 1: \ + __get_user_asm_ex(x, ptr, "b", "b", "=q"); \ + break; \ + case 2: \ + __get_user_asm_ex(x, ptr, "w", "w", "=r"); \ + break; \ + case 4: \ + __get_user_asm_ex(x, ptr, "l", "k", "=r"); \ + break; \ + case 8: \ + __get_user_asm_ex_u64(x, ptr); \ + break; \ + default: \ + (x) = __get_user_bad(); \ + } \ +} while (0) + +#define __get_user_asm_ex(x, addr, itype, rtype, ltype) \ + asm volatile("1: mov"itype" %1,%"rtype"0\n" \ + "2:\n" \ + _ASM_EXTABLE(1b, 2b - 1b) \ + : ltype(x) : "m" (__m(addr))) + #define __put_user_nocheck(x, ptr, size) \ ({ \ int __pu_err; \ @@ -385,6 +447,26 @@ struct __large_struct { unsigned long bu _ASM_EXTABLE(1b, 3b) \ : "=r"(err) \ : ltype(x), "m" (__m(addr)), "i" (errret), "0" (err)) + +#define __put_user_asm_ex(x, addr, itype, rtype, ltype) \ + asm volatile("1: mov"itype" %"rtype"0,%1\n" \ + "2:\n" \ + _ASM_EXTABLE(1b, 2b - 1b) \ + : : ltype(x), "m" (__m(addr))) + +/* + * uaccess_try and catch + */ +#define uaccess_try do { \ + int prev_err = current_thread_info()->uaccess_err; \ + current_thread_info()->uaccess_err = 0; \ + barrier(); + +#define uaccess_catch(err) \ + (err) |= current_thread_info()->uaccess_err; \ + current_thread_info()->uaccess_err = prev_err; \ +} while (0) + /** * __get_user: - Get a simple variable from user space, with less checking. * @x: Variable to store result. @@ -408,6 +490,7 @@ struct __large_struct { unsigned long bu #define __get_user(x, ptr) \ __get_user_nocheck((x), (ptr), sizeof(*(ptr))) + /** * __put_user: - Write a simple value into user space, with less checking. * @x: Value to copy to user space. @@ -435,6 +518,45 @@ struct __large_struct { unsigned long bu #define __put_user_unaligned __put_user /* + * {get|put}_user_try and catch + * + * get_user_try { + * get_user_ex(...); + * } get_user_catch(err) + */ +#define get_user_try uaccess_try +#define get_user_catch(err) uaccess_catch(err) + +#define get_user_ex(x, ptr) do { \ + unsigned long __gue_val; \ + __get_user_size_ex((__gue_val), (ptr), (sizeof(*(ptr)))); \ + (x) = (__force __typeof__(*(ptr)))__gue_val; \ +} while (0) + +#ifdef CONFIG_X86_WP_WORKS_OK + +#define put_user_try uaccess_try +#define put_user_catch(err) uaccess_catch(err) + +#define put_user_ex(x, ptr) \ + __put_user_size_ex((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) + +#else /* !CONFIG_X86_WP_WORKS_OK */ + +#define put_user_try do { \ + int __uaccess_err = 0; + +#define put_user_catch(err) \ + (err) |= __uaccess_err; \ +} while (0) + +#define put_user_ex(x, ptr) do { \ + __uaccess_err |= __put_user(x, ptr); \ +} while (0) + +#endif /* CONFIG_X86_WP_WORKS_OK */ + +/* * movsl can be slow when source and dest are not both 8-byte aligned */ #ifdef CONFIG_X86_INTEL_USERCOPY Index: linux-2.6-tip/arch/x86/include/asm/uaccess_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/uaccess_64.h +++ linux-2.6-tip/arch/x86/include/asm/uaccess_64.h @@ -188,16 +188,16 @@ __copy_to_user_inatomic(void __user *dst extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size, int zerorest); -static inline int __copy_from_user_nocache(void *dst, const void __user *src, - unsigned size) +static inline int +__copy_from_user_nocache(void *dst, const void __user *src, unsigned size) { might_sleep(); return __copy_user_nocache(dst, src, size, 1); } -static inline int __copy_from_user_inatomic_nocache(void *dst, - const void __user *src, - unsigned size) +static inline int +__copy_from_user_inatomic_nocache(void *dst, const void __user *src, + unsigned size) { return __copy_user_nocache(dst, src, size, 0); } Index: linux-2.6-tip/arch/x86/include/asm/unistd_32.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/unistd_32.h +++ linux-2.6-tip/arch/x86/include/asm/unistd_32.h @@ -338,6 +338,10 @@ #define __NR_dup3 330 #define __NR_pipe2 331 #define __NR_inotify_init1 332 +#define __NR_preadv 333 +#define __NR_pwritev 334 +#define __NR_rt_tgsigqueueinfo 335 +#define __NR_perf_counter_open 336 #ifdef __KERNEL__ Index: linux-2.6-tip/arch/x86/include/asm/unistd_64.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/unistd_64.h +++ linux-2.6-tip/arch/x86/include/asm/unistd_64.h @@ -653,7 +653,14 @@ __SYSCALL(__NR_dup3, sys_dup3) __SYSCALL(__NR_pipe2, sys_pipe2) #define __NR_inotify_init1 294 __SYSCALL(__NR_inotify_init1, sys_inotify_init1) - +#define __NR_preadv 295 +__SYSCALL(__NR_preadv, sys_ni_syscall) +#define __NR_pwritev 296 +__SYSCALL(__NR_pwritev, sys_ni_syscall) +#define __NR_rt_tgsigqueueinfo 297 +__SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo) +#define __NR_perf_counter_open 298 +__SYSCALL(__NR_perf_counter_open, sys_perf_counter_open) #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6-tip/arch/x86/include/asm/uv/uv.h =================================================================== --- /dev/null +++ linux-2.6-tip/arch/x86/include/asm/uv/uv.h @@ -0,0 +1,33 @@ +#ifndef _ASM_X86_UV_UV_H +#define _ASM_X86_UV_UV_H + +enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC}; + +struct cpumask; +struct mm_struct; + +#ifdef CONFIG_X86_UV + +extern enum uv_system_type get_uv_system_type(void); +extern int is_uv_system(void); +extern void uv_cpu_init(void); +extern void uv_system_init(void); +extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, + unsigned long va, + unsigned int cpu); + +#else /* X86_UV */ + +static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; } +static inline int is_uv_system(void) { return 0; } +static inline void uv_cpu_init(void) { } +static inline void uv_system_init(void) { } +static inline const struct cpumask * +uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, + unsigned long va, unsigned int cpu) +{ return cpumask; } + +#endif /* X86_UV */ + +#endif /* _ASM_X86_UV_UV_H */ Index: linux-2.6-tip/arch/x86/include/asm/uv/uv_bau.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/uv/uv_bau.h +++ linux-2.6-tip/arch/x86/include/asm/uv/uv_bau.h @@ -325,7 +325,6 @@ static inline void bau_cpubits_clear(str #define cpubit_isset(cpu, bau_local_cpumask) \ test_bit((cpu), (bau_local_cpumask).bits) -extern int uv_flush_tlb_others(cpumask_t *, struct mm_struct *, unsigned long); extern void uv_bau_message_intr1(void); extern void uv_bau_timeout_intr1(void); Index: linux-2.6-tip/arch/x86/include/asm/uv/uv_hub.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/uv/uv_hub.h +++ linux-2.6-tip/arch/x86/include/asm/uv/uv_hub.h @@ -199,6 +199,10 @@ DECLARE_PER_CPU(struct uv_hub_info_s, __ #define SCIR_CPU_ACTIVITY 0x02 /* not idle */ #define SCIR_CPU_HB_INTERVAL (HZ) /* once per second */ +/* Loop through all installed blades */ +#define for_each_possible_blade(bid) \ + for ((bid) = 0; (bid) < uv_num_possible_blades(); (bid)++) + /* * Macros for converting between kernel virtual addresses, socket local physical * addresses, and UV global physical addresses. Index: linux-2.6-tip/arch/x86/include/asm/vic.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/vic.h +++ /dev/null @@ -1,61 +0,0 @@ -/* Copyright (C) 1999,2001 - * - * Author: J.E.J.Bottomley@HansenPartnership.com - * - * Standard include definitions for the NCR Voyager Interrupt Controller */ - -/* The eight CPI vectors. To activate a CPI, you write a bit mask - * corresponding to the processor set to be interrupted into the - * relevant register. That set of CPUs will then be interrupted with - * the CPI */ -static const int VIC_CPI_Registers[] = - {0xFC00, 0xFC01, 0xFC08, 0xFC09, - 0xFC10, 0xFC11, 0xFC18, 0xFC19 }; - -#define VIC_PROC_WHO_AM_I 0xfc29 -# define QUAD_IDENTIFIER 0xC0 -# define EIGHT_SLOT_IDENTIFIER 0xE0 -#define QIC_EXTENDED_PROCESSOR_SELECT 0xFC72 -#define VIC_CPI_BASE_REGISTER 0xFC41 -#define VIC_PROCESSOR_ID 0xFC21 -# define VIC_CPU_MASQUERADE_ENABLE 0x8 - -#define VIC_CLAIM_REGISTER_0 0xFC38 -#define VIC_CLAIM_REGISTER_1 0xFC39 -#define VIC_REDIRECT_REGISTER_0 0xFC60 -#define VIC_REDIRECT_REGISTER_1 0xFC61 -#define VIC_PRIORITY_REGISTER 0xFC20 - -#define VIC_PRIMARY_MC_BASE 0xFC48 -#define VIC_SECONDARY_MC_BASE 0xFC49 - -#define QIC_PROCESSOR_ID 0xFC71 -# define QIC_CPUID_ENABLE 0x08 - -#define QIC_VIC_CPI_BASE_REGISTER 0xFC79 -#define QIC_CPI_BASE_REGISTER 0xFC7A - -#define QIC_MASK_REGISTER0 0xFC80 -/* NOTE: these are masked high, enabled low */ -# define QIC_PERF_TIMER 0x01 -# define QIC_LPE 0x02 -# define QIC_SYS_INT 0x04 -# define QIC_CMN_INT 0x08 -/* at the moment, just enable CMN_INT, disable SYS_INT */ -# define QIC_DEFAULT_MASK0 (~(QIC_CMN_INT /* | VIC_SYS_INT */)) -#define QIC_MASK_REGISTER1 0xFC81 -# define QIC_BOOT_CPI_MASK 0xFE -/* Enable CPI's 1-6 inclusive */ -# define QIC_CPI_ENABLE 0x81 - -#define QIC_INTERRUPT_CLEAR0 0xFC8A -#define QIC_INTERRUPT_CLEAR1 0xFC8B - -/* this is where we place the CPI vectors */ -#define VIC_DEFAULT_CPI_BASE 0xC0 -/* this is where we place the QIC CPI vectors */ -#define QIC_DEFAULT_CPI_BASE 0xD0 - -#define VIC_BOOT_INTERRUPT_MASK 0xfe - -extern void smp_vic_timer_interrupt(void); Index: linux-2.6-tip/arch/x86/include/asm/voyager.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/voyager.h +++ /dev/null @@ -1,529 +0,0 @@ -/* Copyright (C) 1999,2001 - * - * Author: J.E.J.Bottomley@HansenPartnership.com - * - * Standard include definitions for the NCR Voyager system */ - -#undef VOYAGER_DEBUG -#undef VOYAGER_CAT_DEBUG - -#ifdef VOYAGER_DEBUG -#define VDEBUG(x) printk x -#else -#define VDEBUG(x) -#endif - -/* There are three levels of voyager machine: 3,4 and 5. The rule is - * if it's less than 3435 it's a Level 3 except for a 3360 which is - * a level 4. A 3435 or above is a Level 5 */ -#define VOYAGER_LEVEL5_AND_ABOVE 0x3435 -#define VOYAGER_LEVEL4 0x3360 - -/* The L4 DINO ASIC */ -#define VOYAGER_DINO 0x43 - -/* voyager ports in standard I/O space */ -#define VOYAGER_MC_SETUP 0x96 - - -#define VOYAGER_CAT_CONFIG_PORT 0x97 -# define VOYAGER_CAT_DESELECT 0xff -#define VOYAGER_SSPB_RELOCATION_PORT 0x98 - -/* Valid CAT controller commands */ -/* start instruction register cycle */ -#define VOYAGER_CAT_IRCYC 0x01 -/* start data register cycle */ -#define VOYAGER_CAT_DRCYC 0x02 -/* move to execute state */ -#define VOYAGER_CAT_RUN 0x0F -/* end operation */ -#define VOYAGER_CAT_END 0x80 -/* hold in idle state */ -#define VOYAGER_CAT_HOLD 0x90 -/* single step an "intest" vector */ -#define VOYAGER_CAT_STEP 0xE0 -/* return cat controller to CLEMSON mode */ -#define VOYAGER_CAT_CLEMSON 0xFF - -/* the default cat command header */ -#define VOYAGER_CAT_HEADER 0x7F - -/* the range of possible CAT module ids in the system */ -#define VOYAGER_MIN_MODULE 0x10 -#define VOYAGER_MAX_MODULE 0x1f - -/* The voyager registers per asic */ -#define VOYAGER_ASIC_ID_REG 0x00 -#define VOYAGER_ASIC_TYPE_REG 0x01 -/* the sub address registers can be made auto incrementing on reads */ -#define VOYAGER_AUTO_INC_REG 0x02 -# define VOYAGER_AUTO_INC 0x04 -# define VOYAGER_NO_AUTO_INC 0xfb -#define VOYAGER_SUBADDRDATA 0x03 -#define VOYAGER_SCANPATH 0x05 -# define VOYAGER_CONNECT_ASIC 0x01 -# define VOYAGER_DISCONNECT_ASIC 0xfe -#define VOYAGER_SUBADDRLO 0x06 -#define VOYAGER_SUBADDRHI 0x07 -#define VOYAGER_SUBMODSELECT 0x08 -#define VOYAGER_SUBMODPRESENT 0x09 - -#define VOYAGER_SUBADDR_LO 0xff -#define VOYAGER_SUBADDR_HI 0xffff - -/* the maximum size of a scan path -- used to form instructions */ -#define VOYAGER_MAX_SCAN_PATH 0x100 -/* the biggest possible register size (in bytes) */ -#define VOYAGER_MAX_REG_SIZE 4 - -/* Total number of possible modules (including submodules) */ -#define VOYAGER_MAX_MODULES 16 -/* Largest number of asics per module */ -#define VOYAGER_MAX_ASICS_PER_MODULE 7 - -/* the CAT asic of each module is always the first one */ -#define VOYAGER_CAT_ID 0 -#define VOYAGER_PSI 0x1a - -/* voyager instruction operations and registers */ -#define VOYAGER_READ_CONFIG 0x1 -#define VOYAGER_WRITE_CONFIG 0x2 -#define VOYAGER_BYPASS 0xff - -typedef struct voyager_asic { - __u8 asic_addr; /* ASIC address; Level 4 */ - __u8 asic_type; /* ASIC type */ - __u8 asic_id; /* ASIC id */ - __u8 jtag_id[4]; /* JTAG id */ - __u8 asic_location; /* Location within scan path; start w/ 0 */ - __u8 bit_location; /* Location within bit stream; start w/ 0 */ - __u8 ireg_length; /* Instruction register length */ - __u16 subaddr; /* Amount of sub address space */ - struct voyager_asic *next; /* Next asic in linked list */ -} voyager_asic_t; - -typedef struct voyager_module { - __u8 module_addr; /* Module address */ - __u8 scan_path_connected; /* Scan path connected */ - __u16 ee_size; /* Size of the EEPROM */ - __u16 num_asics; /* Number of Asics */ - __u16 inst_bits; /* Instruction bits in the scan path */ - __u16 largest_reg; /* Largest register in the scan path */ - __u16 smallest_reg; /* Smallest register in the scan path */ - voyager_asic_t *asic; /* First ASIC in scan path (CAT_I) */ - struct voyager_module *submodule; /* Submodule pointer */ - struct voyager_module *next; /* Next module in linked list */ -} voyager_module_t; - -typedef struct voyager_eeprom_hdr { - __u8 module_id[4]; - __u8 version_id; - __u8 config_id; - __u16 boundry_id; /* boundary scan id */ - __u16 ee_size; /* size of EEPROM */ - __u8 assembly[11]; /* assembly # */ - __u8 assembly_rev; /* assembly rev */ - __u8 tracer[4]; /* tracer number */ - __u16 assembly_cksum; /* asm checksum */ - __u16 power_consump; /* pwr requirements */ - __u16 num_asics; /* number of asics */ - __u16 bist_time; /* min. bist time */ - __u16 err_log_offset; /* error log offset */ - __u16 scan_path_offset;/* scan path offset */ - __u16 cct_offset; - __u16 log_length; /* length of err log */ - __u16 xsum_end; /* offset to end of - checksum */ - __u8 reserved[4]; - __u8 sflag; /* starting sentinal */ - __u8 part_number[13]; /* prom part number */ - __u8 version[10]; /* version number */ - __u8 signature[8]; - __u16 eeprom_chksum; - __u32 data_stamp_offset; - __u8 eflag ; /* ending sentinal */ -} __attribute__((packed)) voyager_eprom_hdr_t; - - - -#define VOYAGER_EPROM_SIZE_OFFSET \ - ((__u16)(&(((voyager_eprom_hdr_t *)0)->ee_size))) -#define VOYAGER_XSUM_END_OFFSET 0x2a - -/* the following three definitions are for internal table layouts - * in the module EPROMs. We really only care about the IDs and - * offsets */ -typedef struct voyager_sp_table { - __u8 asic_id; - __u8 bypass_flag; - __u16 asic_data_offset; - __u16 config_data_offset; -} __attribute__((packed)) voyager_sp_table_t; - -typedef struct voyager_jtag_table { - __u8 icode[4]; - __u8 runbist[4]; - __u8 intest[4]; - __u8 samp_preld[4]; - __u8 ireg_len; -} __attribute__((packed)) voyager_jtt_t; - -typedef struct voyager_asic_data_table { - __u8 jtag_id[4]; - __u16 length_bsr; - __u16 length_bist_reg; - __u32 bist_clk; - __u16 subaddr_bits; - __u16 seed_bits; - __u16 sig_bits; - __u16 jtag_offset; -} __attribute__((packed)) voyager_at_t; - -/* Voyager Interrupt Controller (VIC) registers */ - -/* Base to add to Cross Processor Interrupts (CPIs) when triggering - * the CPU IRQ line */ -/* register defines for the WCBICs (one per processor) */ -#define VOYAGER_WCBIC0 0x41 /* bus A node P1 processor 0 */ -#define VOYAGER_WCBIC1 0x49 /* bus A node P1 processor 1 */ -#define VOYAGER_WCBIC2 0x51 /* bus A node P2 processor 0 */ -#define VOYAGER_WCBIC3 0x59 /* bus A node P2 processor 1 */ -#define VOYAGER_WCBIC4 0x61 /* bus B node P1 processor 0 */ -#define VOYAGER_WCBIC5 0x69 /* bus B node P1 processor 1 */ -#define VOYAGER_WCBIC6 0x71 /* bus B node P2 processor 0 */ -#define VOYAGER_WCBIC7 0x79 /* bus B node P2 processor 1 */ - - -/* top of memory registers */ -#define VOYAGER_WCBIC_TOM_L 0x4 -#define VOYAGER_WCBIC_TOM_H 0x5 - -/* register defines for Voyager Memory Contol (VMC) - * these are present on L4 machines only */ -#define VOYAGER_VMC1 0x81 -#define VOYAGER_VMC2 0x91 -#define VOYAGER_VMC3 0xa1 -#define VOYAGER_VMC4 0xb1 - -/* VMC Ports */ -#define VOYAGER_VMC_MEMORY_SETUP 0x9 -# define VMC_Interleaving 0x01 -# define VMC_4Way 0x02 -# define VMC_EvenCacheLines 0x04 -# define VMC_HighLine 0x08 -# define VMC_Start0_Enable 0x20 -# define VMC_Start1_Enable 0x40 -# define VMC_Vremap 0x80 -#define VOYAGER_VMC_BANK_DENSITY 0xa -# define VMC_BANK_EMPTY 0 -# define VMC_BANK_4MB 1 -# define VMC_BANK_16MB 2 -# define VMC_BANK_64MB 3 -# define VMC_BANK0_MASK 0x03 -# define VMC_BANK1_MASK 0x0C -# define VMC_BANK2_MASK 0x30 -# define VMC_BANK3_MASK 0xC0 - -/* Magellan Memory Controller (MMC) defines - present on L5 */ -#define VOYAGER_MMC_ASIC_ID 1 -/* the two memory modules corresponding to memory cards in the system */ -#define VOYAGER_MMC_MEMORY0_MODULE 0x14 -#define VOYAGER_MMC_MEMORY1_MODULE 0x15 -/* the Magellan Memory Address (MMA) defines */ -#define VOYAGER_MMA_ASIC_ID 2 - -/* Submodule number for the Quad Baseboard */ -#define VOYAGER_QUAD_BASEBOARD 1 - -/* ASIC defines for the Quad Baseboard */ -#define VOYAGER_QUAD_QDATA0 1 -#define VOYAGER_QUAD_QDATA1 2 -#define VOYAGER_QUAD_QABC 3 - -/* Useful areas in extended CMOS */ -#define VOYAGER_PROCESSOR_PRESENT_MASK 0x88a -#define VOYAGER_MEMORY_CLICKMAP 0xa23 -#define VOYAGER_DUMP_LOCATION 0xb1a - -/* SUS In Control bit - used to tell SUS that we don't need to be - * babysat anymore */ -#define VOYAGER_SUS_IN_CONTROL_PORT 0x3ff -# define VOYAGER_IN_CONTROL_FLAG 0x80 - -/* Voyager PSI defines */ -#define VOYAGER_PSI_STATUS_REG 0x08 -# define PSI_DC_FAIL 0x01 -# define PSI_MON 0x02 -# define PSI_FAULT 0x04 -# define PSI_ALARM 0x08 -# define PSI_CURRENT 0x10 -# define PSI_DVM 0x20 -# define PSI_PSCFAULT 0x40 -# define PSI_STAT_CHG 0x80 - -#define VOYAGER_PSI_SUPPLY_REG 0x8000 - /* read */ -# define PSI_FAIL_DC 0x01 -# define PSI_FAIL_AC 0x02 -# define PSI_MON_INT 0x04 -# define PSI_SWITCH_OFF 0x08 -# define PSI_HX_OFF 0x10 -# define PSI_SECURITY 0x20 -# define PSI_CMOS_BATT_LOW 0x40 -# define PSI_CMOS_BATT_FAIL 0x80 - /* write */ -# define PSI_CLR_SWITCH_OFF 0x13 -# define PSI_CLR_HX_OFF 0x14 -# define PSI_CLR_CMOS_BATT_FAIL 0x17 - -#define VOYAGER_PSI_MASK 0x8001 -# define PSI_MASK_MASK 0x10 - -#define VOYAGER_PSI_AC_FAIL_REG 0x8004 -#define AC_FAIL_STAT_CHANGE 0x80 - -#define VOYAGER_PSI_GENERAL_REG 0x8007 - /* read */ -# define PSI_SWITCH_ON 0x01 -# define PSI_SWITCH_ENABLED 0x02 -# define PSI_ALARM_ENABLED 0x08 -# define PSI_SECURE_ENABLED 0x10 -# define PSI_COLD_RESET 0x20 -# define PSI_COLD_START 0x80 - /* write */ -# define PSI_POWER_DOWN 0x10 -# define PSI_SWITCH_DISABLE 0x01 -# define PSI_SWITCH_ENABLE 0x11 -# define PSI_CLEAR 0x12 -# define PSI_ALARM_DISABLE 0x03 -# define PSI_ALARM_ENABLE 0x13 -# define PSI_CLEAR_COLD_RESET 0x05 -# define PSI_SET_COLD_RESET 0x15 -# define PSI_CLEAR_COLD_START 0x07 -# define PSI_SET_COLD_START 0x17 - - - -struct voyager_bios_info { - __u8 len; - __u8 major; - __u8 minor; - __u8 debug; - __u8 num_classes; - __u8 class_1; - __u8 class_2; -}; - -/* The following structures and definitions are for the Kernel/SUS - * interface these are needed to find out how SUS initialised any Quad - * boards in the system */ - -#define NUMBER_OF_MC_BUSSES 2 -#define SLOTS_PER_MC_BUS 8 -#define MAX_CPUS 16 /* 16 way CPU system */ -#define MAX_PROCESSOR_BOARDS 4 /* 4 processor slot system */ -#define MAX_CACHE_LEVELS 4 /* # of cache levels supported */ -#define MAX_SHARED_CPUS 4 /* # of CPUs that can share a LARC */ -#define NUMBER_OF_POS_REGS 8 - -typedef struct { - __u8 MC_Slot; - __u8 POS_Values[NUMBER_OF_POS_REGS]; -} __attribute__((packed)) MC_SlotInformation_t; - -struct QuadDescription { - __u8 Type; /* for type 0 (DYADIC or MONADIC) all fields - * will be zero except for slot */ - __u8 StructureVersion; - __u32 CPI_BaseAddress; - __u32 LARC_BankSize; - __u32 LocalMemoryStateBits; - __u8 Slot; /* Processor slots 1 - 4 */ -} __attribute__((packed)); - -struct ProcBoardInfo { - __u8 Type; - __u8 StructureVersion; - __u8 NumberOfBoards; - struct QuadDescription QuadData[MAX_PROCESSOR_BOARDS]; -} __attribute__((packed)); - -struct CacheDescription { - __u8 Level; - __u32 TotalSize; - __u16 LineSize; - __u8 Associativity; - __u8 CacheType; - __u8 WriteType; - __u8 Number_CPUs_SharedBy; - __u8 Shared_CPUs_Hardware_IDs[MAX_SHARED_CPUS]; - -} __attribute__((packed)); - -struct CPU_Description { - __u8 CPU_HardwareId; - char *FRU_String; - __u8 NumberOfCacheLevels; - struct CacheDescription CacheLevelData[MAX_CACHE_LEVELS]; -} __attribute__((packed)); - -struct CPU_Info { - __u8 Type; - __u8 StructureVersion; - __u8 NumberOf_CPUs; - struct CPU_Description CPU_Data[MAX_CPUS]; -} __attribute__((packed)); - - -/* - * This structure will be used by SUS and the OS. - * The assumption about this structure is that no blank space is - * packed in it by our friend the compiler. - */ -typedef struct { - __u8 Mailbox_SUS; /* Written to by SUS to give - commands/response to the OS */ - __u8 Mailbox_OS; /* Written to by the OS to give - commands/response to SUS */ - __u8 SUS_MailboxVersion; /* Tells the OS which iteration of the - interface SUS supports */ - __u8 OS_MailboxVersion; /* Tells SUS which iteration of the - interface the OS supports */ - __u32 OS_Flags; /* Flags set by the OS as info for - SUS */ - __u32 SUS_Flags; /* Flags set by SUS as info - for the OS */ - __u32 WatchDogPeriod; /* Watchdog period (in seconds) which - the DP uses to see if the OS - is dead */ - __u32 WatchDogCount; /* Updated by the OS on every tic. */ - __u32 MemoryFor_SUS_ErrorLog; /* Flat 32 bit address which tells SUS - where to stuff the SUS error log - on a dump */ - MC_SlotInformation_t MC_SlotInfo[NUMBER_OF_MC_BUSSES*SLOTS_PER_MC_BUS]; - /* Storage for MCA POS data */ - /* All new SECOND_PASS_INTERFACE fields added from this point */ - struct ProcBoardInfo *BoardData; - struct CPU_Info *CPU_Data; - /* All new fields must be added from this point */ -} Voyager_KernelSUS_Mbox_t; - -/* structure for finding the right memory address to send a QIC CPI to */ -struct voyager_qic_cpi { - /* Each cache line (32 bytes) can trigger a cpi. The cpi - * read/write may occur anywhere in the cache line---pick the - * middle to be safe */ - struct { - __u32 pad1[3]; - __u32 cpi; - __u32 pad2[4]; - } qic_cpi[8]; -}; - -struct voyager_status { - __u32 power_fail:1; - __u32 switch_off:1; - __u32 request_from_kernel:1; -}; - -struct voyager_psi_regs { - __u8 cat_id; - __u8 cat_dev; - __u8 cat_control; - __u8 subaddr; - __u8 dummy4; - __u8 checkbit; - __u8 subaddr_low; - __u8 subaddr_high; - __u8 intstatus; - __u8 stat1; - __u8 stat3; - __u8 fault; - __u8 tms; - __u8 gen; - __u8 sysconf; - __u8 dummy15; -}; - -struct voyager_psi_subregs { - __u8 supply; - __u8 mask; - __u8 present; - __u8 DCfail; - __u8 ACfail; - __u8 fail; - __u8 UPSfail; - __u8 genstatus; -}; - -struct voyager_psi { - struct voyager_psi_regs regs; - struct voyager_psi_subregs subregs; -}; - -struct voyager_SUS { -#define VOYAGER_DUMP_BUTTON_NMI 0x1 -#define VOYAGER_SUS_VALID 0x2 -#define VOYAGER_SYSINT_COMPLETE 0x3 - __u8 SUS_mbox; -#define VOYAGER_NO_COMMAND 0x0 -#define VOYAGER_IGNORE_DUMP 0x1 -#define VOYAGER_DO_DUMP 0x2 -#define VOYAGER_SYSINT_HANDSHAKE 0x3 -#define VOYAGER_DO_MEM_DUMP 0x4 -#define VOYAGER_SYSINT_WAS_RECOVERED 0x5 - __u8 kernel_mbox; -#define VOYAGER_MAILBOX_VERSION 0x10 - __u8 SUS_version; - __u8 kernel_version; -#define VOYAGER_OS_HAS_SYSINT 0x1 -#define VOYAGER_OS_IN_PROGRESS 0x2 -#define VOYAGER_UPDATING_WDPERIOD 0x4 - __u32 kernel_flags; -#define VOYAGER_SUS_BOOTING 0x1 -#define VOYAGER_SUS_IN_PROGRESS 0x2 - __u32 SUS_flags; - __u32 watchdog_period; - __u32 watchdog_count; - __u32 SUS_errorlog; - /* lots of system configuration stuff under here */ -}; - -/* Variables exported by voyager_smp */ -extern __u32 voyager_extended_vic_processors; -extern __u32 voyager_allowed_boot_processors; -extern __u32 voyager_quad_processors; -extern struct voyager_qic_cpi *voyager_quad_cpi_addr[NR_CPUS]; -extern struct voyager_SUS *voyager_SUS; - -/* variables exported always */ -extern struct task_struct *voyager_thread; -extern int voyager_level; -extern struct voyager_status voyager_status; - -/* functions exported by the voyager and voyager_smp modules */ -extern int voyager_cat_readb(__u8 module, __u8 asic, int reg); -extern void voyager_cat_init(void); -extern void voyager_detect(struct voyager_bios_info *); -extern void voyager_trap_init(void); -extern void voyager_setup_irqs(void); -extern int voyager_memory_detect(int region, __u32 *addr, __u32 *length); -extern void voyager_smp_intr_init(void); -extern __u8 voyager_extended_cmos_read(__u16 cmos_address); -extern void voyager_smp_dump(void); -extern void voyager_timer_interrupt(void); -extern void smp_local_timer_interrupt(void); -extern void voyager_power_off(void); -extern void smp_voyager_power_off(void *dummy); -extern void voyager_restart(void); -extern void voyager_cat_power_off(void); -extern void voyager_cat_do_common_interrupt(void); -extern void voyager_handle_nmi(void); -extern void voyager_smp_intr_init(void); -/* Commands for the following are */ -#define VOYAGER_PSI_READ 0 -#define VOYAGER_PSI_WRITE 1 -#define VOYAGER_PSI_SUBREAD 2 -#define VOYAGER_PSI_SUBWRITE 3 -extern void voyager_cat_psi(__u8, __u16, __u8 *); Index: linux-2.6-tip/arch/x86/include/asm/xen/events.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/xen/events.h +++ linux-2.6-tip/arch/x86/include/asm/xen/events.h @@ -15,10 +15,4 @@ static inline int xen_irqs_disabled(stru return raw_irqs_disabled_flags(regs->flags); } -static inline void xen_do_IRQ(int irq, struct pt_regs *regs) -{ - regs->orig_ax = ~irq; - do_IRQ(regs); -} - #endif /* _ASM_X86_XEN_EVENTS_H */ Index: linux-2.6-tip/arch/x86/include/asm/xen/hypercall.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/xen/hypercall.h +++ linux-2.6-tip/arch/x86/include/asm/xen/hypercall.h @@ -296,6 +296,8 @@ HYPERVISOR_get_debugreg(int reg) static inline int HYPERVISOR_update_descriptor(u64 ma, u64 desc) { + if (sizeof(u64) == sizeof(long)) + return _hypercall2(int, update_descriptor, ma, desc); return _hypercall4(int, update_descriptor, ma, ma>>32, desc, desc>>32); } Index: linux-2.6-tip/arch/x86/include/asm/xen/hypervisor.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/xen/hypervisor.h +++ linux-2.6-tip/arch/x86/include/asm/xen/hypervisor.h @@ -38,22 +38,30 @@ extern struct shared_info *HYPERVISOR_sh extern struct start_info *xen_start_info; enum xen_domain_type { - XEN_NATIVE, - XEN_PV_DOMAIN, - XEN_HVM_DOMAIN, + XEN_NATIVE, /* running on bare hardware */ + XEN_PV_DOMAIN, /* running in a PV domain */ + XEN_HVM_DOMAIN, /* running in a Xen hvm domain */ }; -extern enum xen_domain_type xen_domain_type; - #ifdef CONFIG_XEN -#define xen_domain() (xen_domain_type != XEN_NATIVE) +extern enum xen_domain_type xen_domain_type; #else -#define xen_domain() (0) +#define xen_domain_type XEN_NATIVE #endif -#define xen_pv_domain() (xen_domain() && xen_domain_type == XEN_PV_DOMAIN) -#define xen_hvm_domain() (xen_domain() && xen_domain_type == XEN_HVM_DOMAIN) - -#define xen_initial_domain() (xen_pv_domain() && xen_start_info->flags & SIF_INITDOMAIN) +#define xen_domain() (xen_domain_type != XEN_NATIVE) +#define xen_pv_domain() (xen_domain() && \ + xen_domain_type == XEN_PV_DOMAIN) +#define xen_hvm_domain() (xen_domain() && \ + xen_domain_type == XEN_HVM_DOMAIN) + +#ifdef CONFIG_XEN_DOM0 +#include + +#define xen_initial_domain() (xen_pv_domain() && \ + xen_start_info->flags & SIF_INITDOMAIN) +#else /* !CONFIG_XEN_DOM0 */ +#define xen_initial_domain() (0) +#endif /* CONFIG_XEN_DOM0 */ #endif /* _ASM_X86_XEN_HYPERVISOR_H */ Index: linux-2.6-tip/arch/x86/include/asm/xen/page.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/xen/page.h +++ linux-2.6-tip/arch/x86/include/asm/xen/page.h @@ -164,6 +164,7 @@ static inline pte_t __pte_ma(pteval_t x) xmaddr_t arbitrary_virt_to_machine(void *address); +unsigned long arbitrary_virt_to_mfn(void *vaddr); void make_lowmem_page_readonly(void *vaddr); void make_lowmem_page_readwrite(void *vaddr); Index: linux-2.6-tip/arch/x86/include/asm/xor.h =================================================================== --- linux-2.6-tip.orig/arch/x86/include/asm/xor.h +++ linux-2.6-tip/arch/x86/include/asm/xor.h @@ -1,5 +1,10 @@ +#ifdef CONFIG_KMEMCHECK +/* kmemcheck doesn't handle MMX/SSE/SSE2 instructions */ +# include +#else #ifdef CONFIG_X86_32 # include "xor_32.h" #else # include "xor_64.h" #endif +#endif Index: linux-2.6-tip/arch/x86/kernel/Makefile =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/Makefile +++ linux-2.6-tip/arch/x86/kernel/Makefile @@ -23,11 +23,12 @@ nostackp := $(call cc-option, -fno-stack CFLAGS_vsyscall_64.o := $(PROFILING) -g0 $(nostackp) CFLAGS_hpet.o := $(nostackp) CFLAGS_tsc.o := $(nostackp) +CFLAGS_paravirt.o := $(nostackp) obj-y := process_$(BITS).o signal.o entry_$(BITS).o obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o obj-y += time_$(BITS).o ioport.o ldt.o dumpstack.o -obj-y += setup.o i8259.o irqinit_$(BITS).o setup_percpu.o +obj-y += setup.o i8259.o irqinit_$(BITS).o obj-$(CONFIG_X86_VISWS) += visws_quirks.o obj-$(CONFIG_X86_32) += probe_roms_32.o obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o @@ -49,31 +50,28 @@ obj-y += step.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ obj-y += acpi/ -obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o +obj-y += reboot.o obj-$(CONFIG_MCA) += mca_32.o obj-$(CONFIG_X86_MSR) += msr.o obj-$(CONFIG_X86_CPUID) += cpuid.o obj-$(CONFIG_PCI) += early-quirks.o apm-y := apm_32.o obj-$(CONFIG_APM) += apm.o -obj-$(CONFIG_X86_SMP) += smp.o -obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb_$(BITS).o -obj-$(CONFIG_X86_32_SMP) += smpcommon.o -obj-$(CONFIG_X86_64_SMP) += tsc_sync.o smpcommon.o +obj-$(CONFIG_SMP) += smp.o +obj-$(CONFIG_SMP) += smpboot.o tsc_sync.o +obj-$(CONFIG_SMP) += setup_percpu.o +obj-$(CONFIG_X86_64_SMP) += tsc_sync.o obj-$(CONFIG_X86_TRAMPOLINE) += trampoline_$(BITS).o obj-$(CONFIG_X86_MPPARSE) += mpparse.o -obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o -obj-$(CONFIG_X86_IO_APIC) += io_apic.o +obj-y += apic/ obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o -obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o +obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o +obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o -obj-$(CONFIG_X86_NUMAQ) += numaq_32.o -obj-$(CONFIG_X86_ES7000) += es7000_32.o -obj-$(CONFIG_X86_SUMMIT_NUMA) += summit_32.o -obj-y += vsmp_64.o +obj-$(CONFIG_X86_VSMP) += vsmp_64.o obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_MODULES) += module_$(BITS).o obj-$(CONFIG_EFI) += efi.o efi_$(BITS).o efi_stub_$(BITS).o @@ -109,21 +107,18 @@ obj-$(CONFIG_MICROCODE) += microcode.o obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o -obj-$(CONFIG_SWIOTLB) += pci-swiotlb_64.o # NB rename without _64 +obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) - obj-y += genapic_64.o genapic_flat_64.o genx2apic_uv_x.o tlb_uv.o - obj-y += bios_uv.o uv_irq.o uv_sysfs.o - obj-y += genx2apic_cluster.o - obj-y += genx2apic_phys.o - obj-$(CONFIG_X86_PM_TIMER) += pmtimer_64.o - obj-$(CONFIG_AUDIT) += audit_64.o - - obj-$(CONFIG_GART_IOMMU) += pci-gart_64.o aperture_64.o - obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_AMD_IOMMU) += amd_iommu_init.o amd_iommu.o + obj-$(CONFIG_X86_UV) += tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o uv_time.o + obj-$(CONFIG_X86_PM_TIMER) += pmtimer_64.o + obj-$(CONFIG_AUDIT) += audit_64.o + + obj-$(CONFIG_GART_IOMMU) += pci-gart_64.o aperture_64.o + obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o + obj-$(CONFIG_AMD_IOMMU) += amd_iommu_init.o amd_iommu.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o endif Index: linux-2.6-tip/arch/x86/kernel/acpi/boot.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/boot.c +++ linux-2.6-tip/arch/x86/kernel/acpi/boot.c @@ -37,15 +37,10 @@ #include #include #include -#include #include #include #include -#ifdef CONFIG_X86_LOCAL_APIC -# include -#endif - static int __initdata acpi_force = 0; u32 acpi_rsdt_forced; #ifdef CONFIG_ACPI @@ -56,16 +51,7 @@ int acpi_disabled = 1; EXPORT_SYMBOL(acpi_disabled); #ifdef CONFIG_X86_64 - -#include - -#else /* X86 */ - -#ifdef CONFIG_X86_LOCAL_APIC -#include -#include -#endif /* CONFIG_X86_LOCAL_APIC */ - +# include #endif /* X86 */ #define BAD_MADT_ENTRY(entry, end) ( \ @@ -121,35 +107,18 @@ enum acpi_irq_model_id acpi_irq_model = */ char *__init __acpi_map_table(unsigned long phys, unsigned long size) { - unsigned long base, offset, mapped_size; - int idx; if (!phys || !size) return NULL; - if (phys+size <= (max_low_pfn_mapped << PAGE_SHIFT)) - return __va(phys); - - offset = phys & (PAGE_SIZE - 1); - mapped_size = PAGE_SIZE - offset; - clear_fixmap(FIX_ACPI_END); - set_fixmap(FIX_ACPI_END, phys); - base = fix_to_virt(FIX_ACPI_END); - - /* - * Most cases can be covered by the below. - */ - idx = FIX_ACPI_END; - while (mapped_size < size) { - if (--idx < FIX_ACPI_BEGIN) - return NULL; /* cannot handle this */ - phys += PAGE_SIZE; - clear_fixmap(idx); - set_fixmap(idx, phys); - mapped_size += PAGE_SIZE; - } + return early_ioremap(phys, size); +} +void __init __acpi_unmap_table(char *map, unsigned long size) +{ + if (!map || !size) + return; - return ((unsigned char *)base + offset); + early_iounmap(map, size); } #ifdef CONFIG_PCI_MMCONFIG @@ -239,7 +208,8 @@ static int __init acpi_parse_madt(struct madt->address); } - acpi_madt_oem_check(madt->header.oem_id, madt->header.oem_table_id); + default_acpi_madt_oem_check(madt->header.oem_id, + madt->header.oem_table_id); return 0; } @@ -884,7 +854,7 @@ static struct { DECLARE_BITMAP(pin_programmed, MP_MAX_IOAPIC_PIN + 1); } mp_ioapic_routing[MAX_IO_APICS]; -static int mp_find_ioapic(int gsi) +int mp_find_ioapic(int gsi) { int i = 0; @@ -899,6 +869,16 @@ static int mp_find_ioapic(int gsi) return -1; } +int mp_find_ioapic_pin(int ioapic, int gsi) +{ + if (WARN_ON(ioapic == -1)) + return -1; + if (WARN_ON(gsi > mp_ioapic_routing[ioapic].gsi_end)) + return -1; + + return gsi - mp_ioapic_routing[ioapic].gsi_base; +} + static u8 __init uniq_ioapic_id(u8 id) { #ifdef CONFIG_X86_32 @@ -912,8 +892,8 @@ static u8 __init uniq_ioapic_id(u8 id) DECLARE_BITMAP(used, 256); bitmap_zero(used, 256); for (i = 0; i < nr_ioapics; i++) { - struct mp_config_ioapic *ia = &mp_ioapics[i]; - __set_bit(ia->mp_apicid, used); + struct mpc_ioapic *ia = &mp_ioapics[i]; + __set_bit(ia->apicid, used); } if (!test_bit(id, used)) return id; @@ -945,29 +925,29 @@ void __init mp_register_ioapic(int id, u idx = nr_ioapics; - mp_ioapics[idx].mp_type = MP_IOAPIC; - mp_ioapics[idx].mp_flags = MPC_APIC_USABLE; - mp_ioapics[idx].mp_apicaddr = address; + mp_ioapics[idx].type = MP_IOAPIC; + mp_ioapics[idx].flags = MPC_APIC_USABLE; + mp_ioapics[idx].apicaddr = address; set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address); - mp_ioapics[idx].mp_apicid = uniq_ioapic_id(id); + mp_ioapics[idx].apicid = uniq_ioapic_id(id); #ifdef CONFIG_X86_32 - mp_ioapics[idx].mp_apicver = io_apic_get_version(idx); + mp_ioapics[idx].apicver = io_apic_get_version(idx); #else - mp_ioapics[idx].mp_apicver = 0; + mp_ioapics[idx].apicver = 0; #endif /* * Build basic GSI lookup table to facilitate gsi->io_apic lookups * and to prevent reprogramming of IOAPIC pins (PCI GSIs). */ - mp_ioapic_routing[idx].apic_id = mp_ioapics[idx].mp_apicid; + mp_ioapic_routing[idx].apic_id = mp_ioapics[idx].apicid; mp_ioapic_routing[idx].gsi_base = gsi_base; mp_ioapic_routing[idx].gsi_end = gsi_base + io_apic_get_redir_entries(idx); - printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%lx, " - "GSI %d-%d\n", idx, mp_ioapics[idx].mp_apicid, - mp_ioapics[idx].mp_apicver, mp_ioapics[idx].mp_apicaddr, + printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, " + "GSI %d-%d\n", idx, mp_ioapics[idx].apicid, + mp_ioapics[idx].apicver, mp_ioapics[idx].apicaddr, mp_ioapic_routing[idx].gsi_base, mp_ioapic_routing[idx].gsi_end); nr_ioapics++; @@ -996,19 +976,19 @@ int __init acpi_probe_gsi(void) return max_gsi + 1; } -static void assign_to_mp_irq(struct mp_config_intsrc *m, - struct mp_config_intsrc *mp_irq) +static void assign_to_mp_irq(struct mpc_intsrc *m, + struct mpc_intsrc *mp_irq) { - memcpy(mp_irq, m, sizeof(struct mp_config_intsrc)); + memcpy(mp_irq, m, sizeof(struct mpc_intsrc)); } -static int mp_irq_cmp(struct mp_config_intsrc *mp_irq, - struct mp_config_intsrc *m) +static int mp_irq_cmp(struct mpc_intsrc *mp_irq, + struct mpc_intsrc *m) { - return memcmp(mp_irq, m, sizeof(struct mp_config_intsrc)); + return memcmp(mp_irq, m, sizeof(struct mpc_intsrc)); } -static void save_mp_irq(struct mp_config_intsrc *m) +static void save_mp_irq(struct mpc_intsrc *m) { int i; @@ -1026,7 +1006,7 @@ void __init mp_override_legacy_irq(u8 bu { int ioapic; int pin; - struct mp_config_intsrc mp_irq; + struct mpc_intsrc mp_irq; /* * Convert 'gsi' to 'ioapic.pin'. @@ -1034,7 +1014,7 @@ void __init mp_override_legacy_irq(u8 bu ioapic = mp_find_ioapic(gsi); if (ioapic < 0) return; - pin = gsi - mp_ioapic_routing[ioapic].gsi_base; + pin = mp_find_ioapic_pin(ioapic, gsi); /* * TBD: This check is for faulty timer entries, where the override @@ -1044,13 +1024,13 @@ void __init mp_override_legacy_irq(u8 bu if ((bus_irq == 0) && (trigger == 3)) trigger = 1; - mp_irq.mp_type = MP_INTSRC; - mp_irq.mp_irqtype = mp_INT; - mp_irq.mp_irqflag = (trigger << 2) | polarity; - mp_irq.mp_srcbus = MP_ISA_BUS; - mp_irq.mp_srcbusirq = bus_irq; /* IRQ */ - mp_irq.mp_dstapic = mp_ioapics[ioapic].mp_apicid; /* APIC ID */ - mp_irq.mp_dstirq = pin; /* INTIN# */ + mp_irq.type = MP_INTSRC; + mp_irq.irqtype = mp_INT; + mp_irq.irqflag = (trigger << 2) | polarity; + mp_irq.srcbus = MP_ISA_BUS; + mp_irq.srcbusirq = bus_irq; /* IRQ */ + mp_irq.dstapic = mp_ioapics[ioapic].apicid; /* APIC ID */ + mp_irq.dstirq = pin; /* INTIN# */ save_mp_irq(&mp_irq); } @@ -1060,7 +1040,7 @@ void __init mp_config_acpi_legacy_irqs(v int i; int ioapic; unsigned int dstapic; - struct mp_config_intsrc mp_irq; + struct mpc_intsrc mp_irq; #if defined (CONFIG_MCA) || defined (CONFIG_EISA) /* @@ -1085,7 +1065,7 @@ void __init mp_config_acpi_legacy_irqs(v ioapic = mp_find_ioapic(0); if (ioapic < 0) return; - dstapic = mp_ioapics[ioapic].mp_apicid; + dstapic = mp_ioapics[ioapic].apicid; /* * Use the default configuration for the IRQs 0-15. Unless @@ -1095,16 +1075,14 @@ void __init mp_config_acpi_legacy_irqs(v int idx; for (idx = 0; idx < mp_irq_entries; idx++) { - struct mp_config_intsrc *irq = mp_irqs + idx; + struct mpc_intsrc *irq = mp_irqs + idx; /* Do we already have a mapping for this ISA IRQ? */ - if (irq->mp_srcbus == MP_ISA_BUS - && irq->mp_srcbusirq == i) + if (irq->srcbus == MP_ISA_BUS && irq->srcbusirq == i) break; /* Do we already have a mapping for this IOAPIC pin */ - if (irq->mp_dstapic == dstapic && - irq->mp_dstirq == i) + if (irq->dstapic == dstapic && irq->dstirq == i) break; } @@ -1113,13 +1091,13 @@ void __init mp_config_acpi_legacy_irqs(v continue; /* IRQ already used */ } - mp_irq.mp_type = MP_INTSRC; - mp_irq.mp_irqflag = 0; /* Conforming */ - mp_irq.mp_srcbus = MP_ISA_BUS; - mp_irq.mp_dstapic = dstapic; - mp_irq.mp_irqtype = mp_INT; - mp_irq.mp_srcbusirq = i; /* Identity mapped */ - mp_irq.mp_dstirq = i; + mp_irq.type = MP_INTSRC; + mp_irq.irqflag = 0; /* Conforming */ + mp_irq.srcbus = MP_ISA_BUS; + mp_irq.dstapic = dstapic; + mp_irq.irqtype = mp_INT; + mp_irq.srcbusirq = i; /* Identity mapped */ + mp_irq.dstirq = i; save_mp_irq(&mp_irq); } @@ -1156,7 +1134,7 @@ int mp_register_gsi(u32 gsi, int trigger return gsi; } - ioapic_pin = gsi - mp_ioapic_routing[ioapic].gsi_base; + ioapic_pin = mp_find_ioapic_pin(ioapic, gsi); #ifdef CONFIG_X86_32 if (ioapic_renumber_irq) @@ -1230,22 +1208,22 @@ int mp_config_acpi_gsi(unsigned char num u32 gsi, int triggering, int polarity) { #ifdef CONFIG_X86_MPPARSE - struct mp_config_intsrc mp_irq; + struct mpc_intsrc mp_irq; int ioapic; if (!acpi_ioapic) return 0; /* print the entry should happen on mptable identically */ - mp_irq.mp_type = MP_INTSRC; - mp_irq.mp_irqtype = mp_INT; - mp_irq.mp_irqflag = (triggering == ACPI_EDGE_SENSITIVE ? 4 : 0x0c) | + mp_irq.type = MP_INTSRC; + mp_irq.irqtype = mp_INT; + mp_irq.irqflag = (triggering == ACPI_EDGE_SENSITIVE ? 4 : 0x0c) | (polarity == ACPI_ACTIVE_HIGH ? 1 : 3); - mp_irq.mp_srcbus = number; - mp_irq.mp_srcbusirq = (((devfn >> 3) & 0x1f) << 2) | ((pin - 1) & 3); + mp_irq.srcbus = number; + mp_irq.srcbusirq = (((devfn >> 3) & 0x1f) << 2) | ((pin - 1) & 3); ioapic = mp_find_ioapic(gsi); - mp_irq.mp_dstapic = mp_ioapic_routing[ioapic].apic_id; - mp_irq.mp_dstirq = gsi - mp_ioapic_routing[ioapic].gsi_base; + mp_irq.dstapic = mp_ioapic_routing[ioapic].apic_id; + mp_irq.dstirq = mp_find_ioapic_pin(ioapic, gsi); save_mp_irq(&mp_irq); #endif @@ -1372,7 +1350,7 @@ static void __init acpi_process_madt(voi if (!error) { acpi_lapic = 1; -#ifdef CONFIG_X86_GENERICARCH +#ifdef CONFIG_X86_BIGSMP generic_bigsmp_probe(); #endif /* @@ -1384,9 +1362,8 @@ static void __init acpi_process_madt(voi acpi_ioapic = 1; smp_found_config = 1; -#ifdef CONFIG_X86_32 - setup_apic_routing(); -#endif + if (apic->setup_apic_routing) + apic->setup_apic_routing(); } } if (error == -EINVAL) { Index: linux-2.6-tip/arch/x86/kernel/acpi/processor.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/processor.c +++ linux-2.6-tip/arch/x86/kernel/acpi/processor.c @@ -43,6 +43,11 @@ static void init_intel_pdc(struct acpi_p buf[0] = ACPI_PDC_REVISION_ID; buf[1] = 1; buf[2] = ACPI_PDC_C_CAPABILITY_SMP; + /* + * If mwait/monitor is unsupported, C2/C3_FFH will be disabled. + */ + if (!cpu_has(c, X86_FEATURE_MWAIT)) + buf[2] &= ~ACPI_PDC_C_C2C3_FFH; /* * The default of PDC_SMP_T_SWCOORD bit is set for intel x86 cpu so Index: linux-2.6-tip/arch/x86/kernel/acpi/realmode/wakeup.S =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/realmode/wakeup.S +++ linux-2.6-tip/arch/x86/kernel/acpi/realmode/wakeup.S @@ -3,8 +3,8 @@ */ #include #include -#include -#include +#include +#include #include .code16 Index: linux-2.6-tip/arch/x86/kernel/acpi/sleep.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/sleep.c +++ linux-2.6-tip/arch/x86/kernel/acpi/sleep.c @@ -101,6 +101,7 @@ int acpi_save_state_mem(void) stack_start.sp = temp_stack + sizeof(temp_stack); early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(smp_processor_id()); + initial_gs = per_cpu_offset(smp_processor_id()); #endif initial_code = (unsigned long)wakeup_long64; saved_magic = 0x123456789abcdef0; Index: linux-2.6-tip/arch/x86/kernel/acpi/wakeup_32.S =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/wakeup_32.S +++ linux-2.6-tip/arch/x86/kernel/acpi/wakeup_32.S @@ -1,7 +1,7 @@ .section .text.page_aligned #include #include -#include +#include # Copyright 2003, 2008 Pavel Machek , distribute under GPLv2 Index: linux-2.6-tip/arch/x86/kernel/acpi/wakeup_64.S =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/acpi/wakeup_64.S +++ linux-2.6-tip/arch/x86/kernel/acpi/wakeup_64.S @@ -1,8 +1,8 @@ .text #include #include -#include -#include +#include +#include #include #include Index: linux-2.6-tip/arch/x86/kernel/alternative.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c +++ linux-2.6-tip/arch/x86/kernel/alternative.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -12,7 +13,9 @@ #include #include #include +#include #include +#include #define MAX_PATCH_LEN (255-1) @@ -226,6 +229,7 @@ static void alternatives_smp_lock(u8 **s { u8 **ptr; + mutex_lock(&text_mutex); for (ptr = start; ptr < end; ptr++) { if (*ptr < text) continue; @@ -234,6 +238,7 @@ static void alternatives_smp_lock(u8 **s /* turn DS segment override prefix into lock prefix */ text_poke(*ptr, ((unsigned char []){0xf0}), 1); }; + mutex_unlock(&text_mutex); } static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end) @@ -243,6 +248,7 @@ static void alternatives_smp_unlock(u8 * if (noreplace_smp) return; + mutex_lock(&text_mutex); for (ptr = start; ptr < end; ptr++) { if (*ptr < text) continue; @@ -251,6 +257,7 @@ static void alternatives_smp_unlock(u8 * /* turn lock prefix into DS segment override prefix */ text_poke(*ptr, ((unsigned char []){0x3E}), 1); }; + mutex_unlock(&text_mutex); } struct smp_alt_module { @@ -414,9 +421,17 @@ void __init alternative_instructions(voi that might execute the to be patched code. Other CPUs are not running. */ stop_nmi(); -#ifdef CONFIG_X86_MCE - stop_mce(); -#endif + + /* + * Don't stop machine check exceptions while patching. + * MCEs only happen when something got corrupted and in this + * case we must do something about the corruption. + * Ignoring it is worse than a unlikely patching race. + * Also machine checks tend to be broadcast and if one CPU + * goes into machine check the others follow quickly, so we don't + * expect a machine check to cause undue problems during to code + * patching. + */ apply_alternatives(__alt_instructions, __alt_instructions_end); @@ -456,9 +471,6 @@ void __init alternative_instructions(voi (unsigned long)__smp_locks_end); restart_nmi(); -#ifdef CONFIG_X86_MCE - restart_mce(); -#endif } /** @@ -495,12 +507,13 @@ void *text_poke_early(void *addr, const * It means the size must be writable atomically and the address must be aligned * in a way that permits an atomic write. It also makes sure we fit on a single * page. + * + * Note: Must be called under text_mutex. */ void *__kprobes text_poke(void *addr, const void *opcode, size_t len) { unsigned long flags; char *vaddr; - int nr_pages = 2; struct page *pages[2]; int i; @@ -513,18 +526,21 @@ void *__kprobes text_poke(void *addr, co pages[1] = virt_to_page(addr + PAGE_SIZE); } BUG_ON(!pages[0]); - if (!pages[1]) - nr_pages = 1; - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); - BUG_ON(!vaddr); local_irq_save(flags); + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0])); + if (pages[1]) + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1])); + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0); memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len); - local_irq_restore(flags); - vunmap(vaddr); + clear_fixmap(FIX_TEXT_POKE0); + if (pages[1]) + clear_fixmap(FIX_TEXT_POKE1); + local_flush_tlb(); sync_core(); /* Could also do a CLFLUSH here to speed up CPU recovery; but that causes hangs on some VIA CPUs. */ for (i = 0; i < len; i++) BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]); + local_irq_restore(flags); return addr; } Index: linux-2.6-tip/arch/x86/kernel/amd_iommu.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/amd_iommu.c +++ linux-2.6-tip/arch/x86/kernel/amd_iommu.c @@ -22,10 +22,9 @@ #include #include #include +#include #include -#ifdef CONFIG_IOMMU_API #include -#endif #include #include #include @@ -1297,8 +1296,10 @@ static void __unmap_single(struct amd_io /* * The exported map_single function for dma_ops. */ -static dma_addr_t map_single(struct device *dev, phys_addr_t paddr, - size_t size, int dir) +static dma_addr_t map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) { unsigned long flags; struct amd_iommu *iommu; @@ -1306,6 +1307,7 @@ static dma_addr_t map_single(struct devi u16 devid; dma_addr_t addr; u64 dma_mask; + phys_addr_t paddr = page_to_phys(page) + offset; INC_STATS_COUNTER(cnt_map_single); @@ -1340,8 +1342,8 @@ out: /* * The exported unmap_single function for dma_ops. */ -static void unmap_single(struct device *dev, dma_addr_t dma_addr, - size_t size, int dir) +static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) { unsigned long flags; struct amd_iommu *iommu; @@ -1390,7 +1392,8 @@ static int map_sg_no_iommu(struct device * lists). */ static int map_sg(struct device *dev, struct scatterlist *sglist, - int nelems, int dir) + int nelems, enum dma_data_direction dir, + struct dma_attrs *attrs) { unsigned long flags; struct amd_iommu *iommu; @@ -1457,7 +1460,8 @@ unmap: * lists). */ static void unmap_sg(struct device *dev, struct scatterlist *sglist, - int nelems, int dir) + int nelems, enum dma_data_direction dir, + struct dma_attrs *attrs) { unsigned long flags; struct amd_iommu *iommu; @@ -1644,11 +1648,11 @@ static void prealloc_protection_domains( } } -static struct dma_mapping_ops amd_iommu_dma_ops = { +static struct dma_map_ops amd_iommu_dma_ops = { .alloc_coherent = alloc_coherent, .free_coherent = free_coherent, - .map_single = map_single, - .unmap_single = unmap_single, + .map_page = map_page, + .unmap_page = unmap_page, .map_sg = map_sg, .unmap_sg = unmap_sg, .dma_supported = amd_iommu_dma_supported, Index: linux-2.6-tip/arch/x86/kernel/apic.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/apic.c +++ /dev/null @@ -1,2223 +0,0 @@ -/* - * Local APIC handling, local APIC timers - * - * (c) 1999, 2000 Ingo Molnar - * - * Fixes - * Maciej W. Rozycki : Bits for genuine 82489DX APICs; - * thanks to Eric Gilmore - * and Rolf G. Tews - * for testing these extensively. - * Maciej W. Rozycki : Various updates and fixes. - * Mikael Pettersson : Power Management for UP-APIC. - * Pavel Machek and - * Mikael Pettersson : PM converted to driver model. - */ - -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -/* - * Sanity check - */ -#if ((SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F) -# error SPURIOUS_APIC_VECTOR definition error -#endif - -#ifdef CONFIG_X86_32 -/* - * Knob to control our willingness to enable the local APIC. - * - * +1=force-enable - */ -static int force_enable_local_apic; -/* - * APIC command line parameters - */ -static int __init parse_lapic(char *arg) -{ - force_enable_local_apic = 1; - return 0; -} -early_param("lapic", parse_lapic); -/* Local APIC was disabled by the BIOS and enabled by the kernel */ -static int enabled_via_apicbase; - -#endif - -#ifdef CONFIG_X86_64 -static int apic_calibrate_pmtmr __initdata; -static __init int setup_apicpmtimer(char *s) -{ - apic_calibrate_pmtmr = 1; - notsc_setup(NULL); - return 0; -} -__setup("apicpmtimer", setup_apicpmtimer); -#endif - -#ifdef CONFIG_X86_64 -#define HAVE_X2APIC -#endif - -#ifdef HAVE_X2APIC -int x2apic; -/* x2apic enabled before OS handover */ -static int x2apic_preenabled; -static int disable_x2apic; -static __init int setup_nox2apic(char *str) -{ - disable_x2apic = 1; - setup_clear_cpu_cap(X86_FEATURE_X2APIC); - return 0; -} -early_param("nox2apic", setup_nox2apic); -#endif - -unsigned long mp_lapic_addr; -int disable_apic; -/* Disable local APIC timer from the kernel commandline or via dmi quirk */ -static int disable_apic_timer __cpuinitdata; -/* Local APIC timer works in C2 */ -int local_apic_timer_c2_ok; -EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok); - -int first_system_vector = 0xfe; - -/* - * Debug level, exported for io_apic.c - */ -unsigned int apic_verbosity; - -int pic_mode; - -/* Have we found an MP table */ -int smp_found_config; - -static struct resource lapic_resource = { - .name = "Local APIC", - .flags = IORESOURCE_MEM | IORESOURCE_BUSY, -}; - -static unsigned int calibration_result; - -static int lapic_next_event(unsigned long delta, - struct clock_event_device *evt); -static void lapic_timer_setup(enum clock_event_mode mode, - struct clock_event_device *evt); -static void lapic_timer_broadcast(const struct cpumask *mask); -static void apic_pm_activate(void); - -/* - * The local apic timer can be used for any function which is CPU local. - */ -static struct clock_event_device lapic_clockevent = { - .name = "lapic", - .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT - | CLOCK_EVT_FEAT_C3STOP | CLOCK_EVT_FEAT_DUMMY, - .shift = 32, - .set_mode = lapic_timer_setup, - .set_next_event = lapic_next_event, - .broadcast = lapic_timer_broadcast, - .rating = 100, - .irq = -1, -}; -static DEFINE_PER_CPU(struct clock_event_device, lapic_events); - -static unsigned long apic_phys; - -/* - * Get the LAPIC version - */ -static inline int lapic_get_version(void) -{ - return GET_APIC_VERSION(apic_read(APIC_LVR)); -} - -/* - * Check, if the APIC is integrated or a separate chip - */ -static inline int lapic_is_integrated(void) -{ -#ifdef CONFIG_X86_64 - return 1; -#else - return APIC_INTEGRATED(lapic_get_version()); -#endif -} - -/* - * Check, whether this is a modern or a first generation APIC - */ -static int modern_apic(void) -{ - /* AMD systems use old APIC versions, so check the CPU */ - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 >= 0xf) - return 1; - return lapic_get_version() >= 0x14; -} - -/* - * Paravirt kernels also might be using these below ops. So we still - * use generic apic_read()/apic_write(), which might be pointing to different - * ops in PARAVIRT case. - */ -void xapic_wait_icr_idle(void) -{ - while (apic_read(APIC_ICR) & APIC_ICR_BUSY) - cpu_relax(); -} - -u32 safe_xapic_wait_icr_idle(void) -{ - u32 send_status; - int timeout; - - timeout = 0; - do { - send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; - if (!send_status) - break; - udelay(100); - } while (timeout++ < 1000); - - return send_status; -} - -void xapic_icr_write(u32 low, u32 id) -{ - apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id)); - apic_write(APIC_ICR, low); -} - -static u64 xapic_icr_read(void) -{ - u32 icr1, icr2; - - icr2 = apic_read(APIC_ICR2); - icr1 = apic_read(APIC_ICR); - - return icr1 | ((u64)icr2 << 32); -} - -static struct apic_ops xapic_ops = { - .read = native_apic_mem_read, - .write = native_apic_mem_write, - .icr_read = xapic_icr_read, - .icr_write = xapic_icr_write, - .wait_icr_idle = xapic_wait_icr_idle, - .safe_wait_icr_idle = safe_xapic_wait_icr_idle, -}; - -struct apic_ops __read_mostly *apic_ops = &xapic_ops; -EXPORT_SYMBOL_GPL(apic_ops); - -#ifdef HAVE_X2APIC -static void x2apic_wait_icr_idle(void) -{ - /* no need to wait for icr idle in x2apic */ - return; -} - -static u32 safe_x2apic_wait_icr_idle(void) -{ - /* no need to wait for icr idle in x2apic */ - return 0; -} - -void x2apic_icr_write(u32 low, u32 id) -{ - wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low); -} - -static u64 x2apic_icr_read(void) -{ - unsigned long val; - - rdmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), val); - return val; -} - -static struct apic_ops x2apic_ops = { - .read = native_apic_msr_read, - .write = native_apic_msr_write, - .icr_read = x2apic_icr_read, - .icr_write = x2apic_icr_write, - .wait_icr_idle = x2apic_wait_icr_idle, - .safe_wait_icr_idle = safe_x2apic_wait_icr_idle, -}; -#endif - -/** - * enable_NMI_through_LVT0 - enable NMI through local vector table 0 - */ -void __cpuinit enable_NMI_through_LVT0(void) -{ - unsigned int v; - - /* unmask and set to NMI */ - v = APIC_DM_NMI; - - /* Level triggered for 82489DX (32bit mode) */ - if (!lapic_is_integrated()) - v |= APIC_LVT_LEVEL_TRIGGER; - - apic_write(APIC_LVT0, v); -} - -#ifdef CONFIG_X86_32 -/** - * get_physical_broadcast - Get number of physical broadcast IDs - */ -int get_physical_broadcast(void) -{ - return modern_apic() ? 0xff : 0xf; -} -#endif - -/** - * lapic_get_maxlvt - get the maximum number of local vector table entries - */ -int lapic_get_maxlvt(void) -{ - unsigned int v; - - v = apic_read(APIC_LVR); - /* - * - we always have APIC integrated on 64bit mode - * - 82489DXs do not report # of LVT entries - */ - return APIC_INTEGRATED(GET_APIC_VERSION(v)) ? GET_APIC_MAXLVT(v) : 2; -} - -/* - * Local APIC timer - */ - -/* Clock divisor */ -#define APIC_DIVISOR 16 - -/* - * This function sets up the local APIC timer, with a timeout of - * 'clocks' APIC bus clock. During calibration we actually call - * this function twice on the boot CPU, once with a bogus timeout - * value, second time for real. The other (noncalibrating) CPUs - * call this function only once, with the real, calibrated value. - * - * We do reads before writes even if unnecessary, to get around the - * P5 APIC double write bug. - */ -static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) -{ - unsigned int lvtt_value, tmp_value; - - lvtt_value = LOCAL_TIMER_VECTOR; - if (!oneshot) - lvtt_value |= APIC_LVT_TIMER_PERIODIC; - if (!lapic_is_integrated()) - lvtt_value |= SET_APIC_TIMER_BASE(APIC_TIMER_BASE_DIV); - - if (!irqen) - lvtt_value |= APIC_LVT_MASKED; - - apic_write(APIC_LVTT, lvtt_value); - - /* - * Divide PICLK by 16 - */ - tmp_value = apic_read(APIC_TDCR); - apic_write(APIC_TDCR, - (tmp_value & ~(APIC_TDR_DIV_1 | APIC_TDR_DIV_TMBASE)) | - APIC_TDR_DIV_16); - - if (!oneshot) - apic_write(APIC_TMICT, clocks / APIC_DIVISOR); -} - -/* - * Setup extended LVT, AMD specific (K8, family 10h) - * - * Vector mappings are hard coded. On K8 only offset 0 (APIC500) and - * MCE interrupts are supported. Thus MCE offset must be set to 0. - * - * If mask=1, the LVT entry does not generate interrupts while mask=0 - * enables the vector. See also the BKDGs. - */ - -#define APIC_EILVT_LVTOFF_MCE 0 -#define APIC_EILVT_LVTOFF_IBS 1 - -static void setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask) -{ - unsigned long reg = (lvt_off << 4) + APIC_EILVT0; - unsigned int v = (mask << 16) | (msg_type << 8) | vector; - - apic_write(reg, v); -} - -u8 setup_APIC_eilvt_mce(u8 vector, u8 msg_type, u8 mask) -{ - setup_APIC_eilvt(APIC_EILVT_LVTOFF_MCE, vector, msg_type, mask); - return APIC_EILVT_LVTOFF_MCE; -} - -u8 setup_APIC_eilvt_ibs(u8 vector, u8 msg_type, u8 mask) -{ - setup_APIC_eilvt(APIC_EILVT_LVTOFF_IBS, vector, msg_type, mask); - return APIC_EILVT_LVTOFF_IBS; -} -EXPORT_SYMBOL_GPL(setup_APIC_eilvt_ibs); - -/* - * Program the next event, relative to now - */ -static int lapic_next_event(unsigned long delta, - struct clock_event_device *evt) -{ - apic_write(APIC_TMICT, delta); - return 0; -} - -/* - * Setup the lapic timer in periodic or oneshot mode - */ -static void lapic_timer_setup(enum clock_event_mode mode, - struct clock_event_device *evt) -{ - unsigned long flags; - unsigned int v; - - /* Lapic used as dummy for broadcast ? */ - if (evt->features & CLOCK_EVT_FEAT_DUMMY) - return; - - local_irq_save(flags); - - switch (mode) { - case CLOCK_EVT_MODE_PERIODIC: - case CLOCK_EVT_MODE_ONESHOT: - __setup_APIC_LVTT(calibration_result, - mode != CLOCK_EVT_MODE_PERIODIC, 1); - break; - case CLOCK_EVT_MODE_UNUSED: - case CLOCK_EVT_MODE_SHUTDOWN: - v = apic_read(APIC_LVTT); - v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); - apic_write(APIC_LVTT, v); - apic_write(APIC_TMICT, 0xffffffff); - break; - case CLOCK_EVT_MODE_RESUME: - /* Nothing to do here */ - break; - } - - local_irq_restore(flags); -} - -/* - * Local APIC timer broadcast function - */ -static void lapic_timer_broadcast(const struct cpumask *mask) -{ -#ifdef CONFIG_SMP - send_IPI_mask(mask, LOCAL_TIMER_VECTOR); -#endif -} - -/* - * Setup the local APIC timer for this CPU. Copy the initilized values - * of the boot CPU and register the clock event in the framework. - */ -static void __cpuinit setup_APIC_timer(void) -{ - struct clock_event_device *levt = &__get_cpu_var(lapic_events); - - memcpy(levt, &lapic_clockevent, sizeof(*levt)); - levt->cpumask = cpumask_of(smp_processor_id()); - - clockevents_register_device(levt); -} - -/* - * In this functions we calibrate APIC bus clocks to the external timer. - * - * We want to do the calibration only once since we want to have local timer - * irqs syncron. CPUs connected by the same APIC bus have the very same bus - * frequency. - * - * This was previously done by reading the PIT/HPET and waiting for a wrap - * around to find out, that a tick has elapsed. I have a box, where the PIT - * readout is broken, so it never gets out of the wait loop again. This was - * also reported by others. - * - * Monitoring the jiffies value is inaccurate and the clockevents - * infrastructure allows us to do a simple substitution of the interrupt - * handler. - * - * The calibration routine also uses the pm_timer when possible, as the PIT - * happens to run way too slow (factor 2.3 on my VAIO CoreDuo, which goes - * back to normal later in the boot process). - */ - -#define LAPIC_CAL_LOOPS (HZ/10) - -static __initdata int lapic_cal_loops = -1; -static __initdata long lapic_cal_t1, lapic_cal_t2; -static __initdata unsigned long long lapic_cal_tsc1, lapic_cal_tsc2; -static __initdata unsigned long lapic_cal_pm1, lapic_cal_pm2; -static __initdata unsigned long lapic_cal_j1, lapic_cal_j2; - -/* - * Temporary interrupt handler. - */ -static void __init lapic_cal_handler(struct clock_event_device *dev) -{ - unsigned long long tsc = 0; - long tapic = apic_read(APIC_TMCCT); - unsigned long pm = acpi_pm_read_early(); - - if (cpu_has_tsc) - rdtscll(tsc); - - switch (lapic_cal_loops++) { - case 0: - lapic_cal_t1 = tapic; - lapic_cal_tsc1 = tsc; - lapic_cal_pm1 = pm; - lapic_cal_j1 = jiffies; - break; - - case LAPIC_CAL_LOOPS: - lapic_cal_t2 = tapic; - lapic_cal_tsc2 = tsc; - if (pm < lapic_cal_pm1) - pm += ACPI_PM_OVRRUN; - lapic_cal_pm2 = pm; - lapic_cal_j2 = jiffies; - break; - } -} - -static int __init calibrate_by_pmtimer(long deltapm, long *delta) -{ - const long pm_100ms = PMTMR_TICKS_PER_SEC / 10; - const long pm_thresh = pm_100ms / 100; - unsigned long mult; - u64 res; - -#ifndef CONFIG_X86_PM_TIMER - return -1; -#endif - - apic_printk(APIC_VERBOSE, "... PM timer delta = %ld\n", deltapm); - - /* Check, if the PM timer is available */ - if (!deltapm) - return -1; - - mult = clocksource_hz2mult(PMTMR_TICKS_PER_SEC, 22); - - if (deltapm > (pm_100ms - pm_thresh) && - deltapm < (pm_100ms + pm_thresh)) { - apic_printk(APIC_VERBOSE, "... PM timer result ok\n"); - } else { - res = (((u64)deltapm) * mult) >> 22; - do_div(res, 1000000); - pr_warning("APIC calibration not consistent " - "with PM Timer: %ldms instead of 100ms\n", - (long)res); - /* Correct the lapic counter value */ - res = (((u64)(*delta)) * pm_100ms); - do_div(res, deltapm); - pr_info("APIC delta adjusted to PM-Timer: " - "%lu (%ld)\n", (unsigned long)res, *delta); - *delta = (long)res; - } - - return 0; -} - -static int __init calibrate_APIC_clock(void) -{ - struct clock_event_device *levt = &__get_cpu_var(lapic_events); - void (*real_handler)(struct clock_event_device *dev); - unsigned long deltaj; - long delta; - int pm_referenced = 0; - - local_irq_disable(); - - /* Replace the global interrupt handler */ - real_handler = global_clock_event->event_handler; - global_clock_event->event_handler = lapic_cal_handler; - - /* - * Setup the APIC counter to maximum. There is no way the lapic - * can underflow in the 100ms detection time frame - */ - __setup_APIC_LVTT(0xffffffff, 0, 0); - - /* Let the interrupts run */ - local_irq_enable(); - - while (lapic_cal_loops <= LAPIC_CAL_LOOPS) - cpu_relax(); - - local_irq_disable(); - - /* Restore the real event handler */ - global_clock_event->event_handler = real_handler; - - /* Build delta t1-t2 as apic timer counts down */ - delta = lapic_cal_t1 - lapic_cal_t2; - apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta); - - /* we trust the PM based calibration if possible */ - pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 - lapic_cal_pm1, - &delta); - - /* Calculate the scaled math multiplication factor */ - lapic_clockevent.mult = div_sc(delta, TICK_NSEC * LAPIC_CAL_LOOPS, - lapic_clockevent.shift); - lapic_clockevent.max_delta_ns = - clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); - lapic_clockevent.min_delta_ns = - clockevent_delta2ns(0xF, &lapic_clockevent); - - calibration_result = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS; - - apic_printk(APIC_VERBOSE, "..... delta %ld\n", delta); - apic_printk(APIC_VERBOSE, "..... mult: %ld\n", lapic_clockevent.mult); - apic_printk(APIC_VERBOSE, "..... calibration result: %u\n", - calibration_result); - - if (cpu_has_tsc) { - delta = (long)(lapic_cal_tsc2 - lapic_cal_tsc1); - apic_printk(APIC_VERBOSE, "..... CPU clock speed is " - "%ld.%04ld MHz.\n", - (delta / LAPIC_CAL_LOOPS) / (1000000 / HZ), - (delta / LAPIC_CAL_LOOPS) % (1000000 / HZ)); - } - - apic_printk(APIC_VERBOSE, "..... host bus clock speed is " - "%u.%04u MHz.\n", - calibration_result / (1000000 / HZ), - calibration_result % (1000000 / HZ)); - - /* - * Do a sanity check on the APIC calibration result - */ - if (calibration_result < (1000000 / HZ)) { - local_irq_enable(); - pr_warning("APIC frequency too slow, disabling apic timer\n"); - return -1; - } - - levt->features &= ~CLOCK_EVT_FEAT_DUMMY; - - /* - * PM timer calibration failed or not turned on - * so lets try APIC timer based calibration - */ - if (!pm_referenced) { - apic_printk(APIC_VERBOSE, "... verify APIC timer\n"); - - /* - * Setup the apic timer manually - */ - levt->event_handler = lapic_cal_handler; - lapic_timer_setup(CLOCK_EVT_MODE_PERIODIC, levt); - lapic_cal_loops = -1; - - /* Let the interrupts run */ - local_irq_enable(); - - while (lapic_cal_loops <= LAPIC_CAL_LOOPS) - cpu_relax(); - - /* Stop the lapic timer */ - lapic_timer_setup(CLOCK_EVT_MODE_SHUTDOWN, levt); - - /* Jiffies delta */ - deltaj = lapic_cal_j2 - lapic_cal_j1; - apic_printk(APIC_VERBOSE, "... jiffies delta = %lu\n", deltaj); - - /* Check, if the jiffies result is consistent */ - if (deltaj >= LAPIC_CAL_LOOPS-2 && deltaj <= LAPIC_CAL_LOOPS+2) - apic_printk(APIC_VERBOSE, "... jiffies result ok\n"); - else - levt->features |= CLOCK_EVT_FEAT_DUMMY; - } else - local_irq_enable(); - - if (levt->features & CLOCK_EVT_FEAT_DUMMY) { - pr_warning("APIC timer disabled due to verification failure\n"); - return -1; - } - - return 0; -} - -/* - * Setup the boot APIC - * - * Calibrate and verify the result. - */ -void __init setup_boot_APIC_clock(void) -{ - /* - * The local apic timer can be disabled via the kernel - * commandline or from the CPU detection code. Register the lapic - * timer as a dummy clock event source on SMP systems, so the - * broadcast mechanism is used. On UP systems simply ignore it. - */ - if (disable_apic_timer) { - pr_info("Disabling APIC timer\n"); - /* No broadcast on UP ! */ - if (num_possible_cpus() > 1) { - lapic_clockevent.mult = 1; - setup_APIC_timer(); - } - return; - } - - apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n" - "calibrating APIC timer ...\n"); - - if (calibrate_APIC_clock()) { - /* No broadcast on UP ! */ - if (num_possible_cpus() > 1) - setup_APIC_timer(); - return; - } - - /* - * If nmi_watchdog is set to IO_APIC, we need the - * PIT/HPET going. Otherwise register lapic as a dummy - * device. - */ - if (nmi_watchdog != NMI_IO_APIC) - lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY; - else - pr_warning("APIC timer registered as dummy," - " due to nmi_watchdog=%d!\n", nmi_watchdog); - - /* Setup the lapic or request the broadcast */ - setup_APIC_timer(); -} - -void __cpuinit setup_secondary_APIC_clock(void) -{ - setup_APIC_timer(); -} - -/* - * The guts of the apic timer interrupt - */ -static void local_apic_timer_interrupt(void) -{ - int cpu = smp_processor_id(); - struct clock_event_device *evt = &per_cpu(lapic_events, cpu); - - /* - * Normally we should not be here till LAPIC has been initialized but - * in some cases like kdump, its possible that there is a pending LAPIC - * timer interrupt from previous kernel's context and is delivered in - * new kernel the moment interrupts are enabled. - * - * Interrupts are enabled early and LAPIC is setup much later, hence - * its possible that when we get here evt->event_handler is NULL. - * Check for event_handler being NULL and discard the interrupt as - * spurious. - */ - if (!evt->event_handler) { - pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu); - /* Switch it off */ - lapic_timer_setup(CLOCK_EVT_MODE_SHUTDOWN, evt); - return; - } - - /* - * the NMI deadlock-detector uses this. - */ - inc_irq_stat(apic_timer_irqs); - - evt->event_handler(evt); -} - -/* - * Local APIC timer interrupt. This is the most natural way for doing - * local interrupts, but local timer interrupts can be emulated by - * broadcast interrupts too. [in case the hw doesn't support APIC timers] - * - * [ if a single-CPU system runs an SMP kernel then we call the local - * interrupt as well. Thus we cannot inline the local irq ... ] - */ -void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) -{ - struct pt_regs *old_regs = set_irq_regs(regs); - - /* - * NOTE! We'd better ACK the irq immediately, - * because timer handling can be slow. - */ - ack_APIC_irq(); - /* - * update_process_times() expects us to have done irq_enter(). - * Besides, if we don't timer interrupts ignore the global - * interrupt lock, which is the WrongThing (tm) to do. - */ - exit_idle(); - irq_enter(); - local_apic_timer_interrupt(); - irq_exit(); - - set_irq_regs(old_regs); -} - -int setup_profiling_timer(unsigned int multiplier) -{ - return -EINVAL; -} - -/* - * Local APIC start and shutdown - */ - -/** - * clear_local_APIC - shutdown the local APIC - * - * This is called, when a CPU is disabled and before rebooting, so the state of - * the local APIC has no dangling leftovers. Also used to cleanout any BIOS - * leftovers during boot. - */ -void clear_local_APIC(void) -{ - int maxlvt; - u32 v; - - /* APIC hasn't been mapped yet */ - if (!apic_phys) - return; - - maxlvt = lapic_get_maxlvt(); - /* - * Masking an LVT entry can trigger a local APIC error - * if the vector is zero. Mask LVTERR first to prevent this. - */ - if (maxlvt >= 3) { - v = ERROR_APIC_VECTOR; /* any non-zero vector will do */ - apic_write(APIC_LVTERR, v | APIC_LVT_MASKED); - } - /* - * Careful: we have to set masks only first to deassert - * any level-triggered sources. - */ - v = apic_read(APIC_LVTT); - apic_write(APIC_LVTT, v | APIC_LVT_MASKED); - v = apic_read(APIC_LVT0); - apic_write(APIC_LVT0, v | APIC_LVT_MASKED); - v = apic_read(APIC_LVT1); - apic_write(APIC_LVT1, v | APIC_LVT_MASKED); - if (maxlvt >= 4) { - v = apic_read(APIC_LVTPC); - apic_write(APIC_LVTPC, v | APIC_LVT_MASKED); - } - - /* lets not touch this if we didn't frob it */ -#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL) - if (maxlvt >= 5) { - v = apic_read(APIC_LVTTHMR); - apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); - } -#endif - /* - * Clean APIC state for other OSs: - */ - apic_write(APIC_LVTT, APIC_LVT_MASKED); - apic_write(APIC_LVT0, APIC_LVT_MASKED); - apic_write(APIC_LVT1, APIC_LVT_MASKED); - if (maxlvt >= 3) - apic_write(APIC_LVTERR, APIC_LVT_MASKED); - if (maxlvt >= 4) - apic_write(APIC_LVTPC, APIC_LVT_MASKED); - - /* Integrated APIC (!82489DX) ? */ - if (lapic_is_integrated()) { - if (maxlvt > 3) - /* Clear ESR due to Pentium errata 3AP and 11AP */ - apic_write(APIC_ESR, 0); - apic_read(APIC_ESR); - } -} - -/** - * disable_local_APIC - clear and disable the local APIC - */ -void disable_local_APIC(void) -{ - unsigned int value; - - /* APIC hasn't been mapped yet */ - if (!apic_phys) - return; - - clear_local_APIC(); - - /* - * Disable APIC (implies clearing of registers - * for 82489DX!). - */ - value = apic_read(APIC_SPIV); - value &= ~APIC_SPIV_APIC_ENABLED; - apic_write(APIC_SPIV, value); - -#ifdef CONFIG_X86_32 - /* - * When LAPIC was disabled by the BIOS and enabled by the kernel, - * restore the disabled state. - */ - if (enabled_via_apicbase) { - unsigned int l, h; - - rdmsr(MSR_IA32_APICBASE, l, h); - l &= ~MSR_IA32_APICBASE_ENABLE; - wrmsr(MSR_IA32_APICBASE, l, h); - } -#endif -} - -/* - * If Linux enabled the LAPIC against the BIOS default disable it down before - * re-entering the BIOS on shutdown. Otherwise the BIOS may get confused and - * not power-off. Additionally clear all LVT entries before disable_local_APIC - * for the case where Linux didn't enable the LAPIC. - */ -void lapic_shutdown(void) -{ - unsigned long flags; - - if (!cpu_has_apic) - return; - - local_irq_save(flags); - -#ifdef CONFIG_X86_32 - if (!enabled_via_apicbase) - clear_local_APIC(); - else -#endif - disable_local_APIC(); - - - local_irq_restore(flags); -} - -/* - * This is to verify that we're looking at a real local APIC. - * Check these against your board if the CPUs aren't getting - * started for no apparent reason. - */ -int __init verify_local_APIC(void) -{ - unsigned int reg0, reg1; - - /* - * The version register is read-only in a real APIC. - */ - reg0 = apic_read(APIC_LVR); - apic_printk(APIC_DEBUG, "Getting VERSION: %x\n", reg0); - apic_write(APIC_LVR, reg0 ^ APIC_LVR_MASK); - reg1 = apic_read(APIC_LVR); - apic_printk(APIC_DEBUG, "Getting VERSION: %x\n", reg1); - - /* - * The two version reads above should print the same - * numbers. If the second one is different, then we - * poke at a non-APIC. - */ - if (reg1 != reg0) - return 0; - - /* - * Check if the version looks reasonably. - */ - reg1 = GET_APIC_VERSION(reg0); - if (reg1 == 0x00 || reg1 == 0xff) - return 0; - reg1 = lapic_get_maxlvt(); - if (reg1 < 0x02 || reg1 == 0xff) - return 0; - - /* - * The ID register is read/write in a real APIC. - */ - reg0 = apic_read(APIC_ID); - apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0); - apic_write(APIC_ID, reg0 ^ APIC_ID_MASK); - reg1 = apic_read(APIC_ID); - apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1); - apic_write(APIC_ID, reg0); - if (reg1 != (reg0 ^ APIC_ID_MASK)) - return 0; - - /* - * The next two are just to see if we have sane values. - * They're only really relevant if we're in Virtual Wire - * compatibility mode, but most boxes are anymore. - */ - reg0 = apic_read(APIC_LVT0); - apic_printk(APIC_DEBUG, "Getting LVT0: %x\n", reg0); - reg1 = apic_read(APIC_LVT1); - apic_printk(APIC_DEBUG, "Getting LVT1: %x\n", reg1); - - return 1; -} - -/** - * sync_Arb_IDs - synchronize APIC bus arbitration IDs - */ -void __init sync_Arb_IDs(void) -{ - /* - * Unsupported on P4 - see Intel Dev. Manual Vol. 3, Ch. 8.6.1 And not - * needed on AMD. - */ - if (modern_apic() || boot_cpu_data.x86_vendor == X86_VENDOR_AMD) - return; - - /* - * Wait for idle. - */ - apic_wait_icr_idle(); - - apic_printk(APIC_DEBUG, "Synchronizing Arb IDs.\n"); - apic_write(APIC_ICR, APIC_DEST_ALLINC | - APIC_INT_LEVELTRIG | APIC_DM_INIT); -} - -/* - * An initial setup of the virtual wire mode. - */ -void __init init_bsp_APIC(void) -{ - unsigned int value; - - /* - * Don't do the setup now if we have a SMP BIOS as the - * through-I/O-APIC virtual wire mode might be active. - */ - if (smp_found_config || !cpu_has_apic) - return; - - /* - * Do not trust the local APIC being empty at bootup. - */ - clear_local_APIC(); - - /* - * Enable APIC. - */ - value = apic_read(APIC_SPIV); - value &= ~APIC_VECTOR_MASK; - value |= APIC_SPIV_APIC_ENABLED; - -#ifdef CONFIG_X86_32 - /* This bit is reserved on P4/Xeon and should be cleared */ - if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && - (boot_cpu_data.x86 == 15)) - value &= ~APIC_SPIV_FOCUS_DISABLED; - else -#endif - value |= APIC_SPIV_FOCUS_DISABLED; - value |= SPURIOUS_APIC_VECTOR; - apic_write(APIC_SPIV, value); - - /* - * Set up the virtual wire mode. - */ - apic_write(APIC_LVT0, APIC_DM_EXTINT); - value = APIC_DM_NMI; - if (!lapic_is_integrated()) /* 82489DX */ - value |= APIC_LVT_LEVEL_TRIGGER; - apic_write(APIC_LVT1, value); -} - -static void __cpuinit lapic_setup_esr(void) -{ - unsigned int oldvalue, value, maxlvt; - - if (!lapic_is_integrated()) { - pr_info("No ESR for 82489DX.\n"); - return; - } - - if (esr_disable) { - /* - * Something untraceable is creating bad interrupts on - * secondary quads ... for the moment, just leave the - * ESR disabled - we can't do anything useful with the - * errors anyway - mbligh - */ - pr_info("Leaving ESR disabled.\n"); - return; - } - - maxlvt = lapic_get_maxlvt(); - if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ - apic_write(APIC_ESR, 0); - oldvalue = apic_read(APIC_ESR); - - /* enables sending errors */ - value = ERROR_APIC_VECTOR; - apic_write(APIC_LVTERR, value); - - /* - * spec says clear errors after enabling vector. - */ - if (maxlvt > 3) - apic_write(APIC_ESR, 0); - value = apic_read(APIC_ESR); - if (value != oldvalue) - apic_printk(APIC_VERBOSE, "ESR value before enabling " - "vector: 0x%08x after: 0x%08x\n", - oldvalue, value); -} - - -/** - * setup_local_APIC - setup the local APIC - */ -void __cpuinit setup_local_APIC(void) -{ - unsigned int value; - int i, j; - -#ifdef CONFIG_X86_32 - /* Pound the ESR really hard over the head with a big hammer - mbligh */ - if (lapic_is_integrated() && esr_disable) { - apic_write(APIC_ESR, 0); - apic_write(APIC_ESR, 0); - apic_write(APIC_ESR, 0); - apic_write(APIC_ESR, 0); - } -#endif - - preempt_disable(); - - /* - * Double-check whether this APIC is really registered. - * This is meaningless in clustered apic mode, so we skip it. - */ - if (!apic_id_registered()) - BUG(); - - /* - * Intel recommends to set DFR, LDR and TPR before enabling - * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel - * document number 292116). So here it goes... - */ - init_apic_ldr(); - - /* - * Set Task Priority to 'accept all'. We never change this - * later on. - */ - value = apic_read(APIC_TASKPRI); - value &= ~APIC_TPRI_MASK; - apic_write(APIC_TASKPRI, value); - - /* - * After a crash, we no longer service the interrupts and a pending - * interrupt from previous kernel might still have ISR bit set. - * - * Most probably by now CPU has serviced that pending interrupt and - * it might not have done the ack_APIC_irq() because it thought, - * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it - * does not clear the ISR bit and cpu thinks it has already serivced - * the interrupt. Hence a vector might get locked. It was noticed - * for timer irq (vector 0x31). Issue an extra EOI to clear ISR. - */ - for (i = APIC_ISR_NR - 1; i >= 0; i--) { - value = apic_read(APIC_ISR + i*0x10); - for (j = 31; j >= 0; j--) { - if (value & (1< 1) || - (boot_cpu_data.x86 >= 15)) - break; - goto no_apic; - case X86_VENDOR_INTEL: - if (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15 || - (boot_cpu_data.x86 == 5 && cpu_has_apic)) - break; - goto no_apic; - default: - goto no_apic; - } - - if (!cpu_has_apic) { - /* - * Over-ride BIOS and try to enable the local APIC only if - * "lapic" specified. - */ - if (!force_enable_local_apic) { - pr_info("Local APIC disabled by BIOS -- " - "you can enable it with \"lapic\"\n"); - return -1; - } - /* - * Some BIOSes disable the local APIC in the APIC_BASE - * MSR. This can only be done in software for Intel P6 or later - * and AMD K7 (Model > 1) or later. - */ - rdmsr(MSR_IA32_APICBASE, l, h); - if (!(l & MSR_IA32_APICBASE_ENABLE)) { - pr_info("Local APIC disabled by BIOS -- reenabling.\n"); - l &= ~MSR_IA32_APICBASE_BASE; - l |= MSR_IA32_APICBASE_ENABLE | APIC_DEFAULT_PHYS_BASE; - wrmsr(MSR_IA32_APICBASE, l, h); - enabled_via_apicbase = 1; - } - } - /* - * The APIC feature bit should now be enabled - * in `cpuid' - */ - features = cpuid_edx(1); - if (!(features & (1 << X86_FEATURE_APIC))) { - pr_warning("Could not enable APIC!\n"); - return -1; - } - set_cpu_cap(&boot_cpu_data, X86_FEATURE_APIC); - mp_lapic_addr = APIC_DEFAULT_PHYS_BASE; - - /* The BIOS may have set up the APIC at some other address */ - rdmsr(MSR_IA32_APICBASE, l, h); - if (l & MSR_IA32_APICBASE_ENABLE) - mp_lapic_addr = l & MSR_IA32_APICBASE_BASE; - - pr_info("Found and enabled local APIC!\n"); - - apic_pm_activate(); - - return 0; - -no_apic: - pr_info("No local APIC present or hardware disabled\n"); - return -1; -} -#endif - -#ifdef CONFIG_X86_64 -void __init early_init_lapic_mapping(void) -{ - unsigned long phys_addr; - - /* - * If no local APIC can be found then go out - * : it means there is no mpatable and MADT - */ - if (!smp_found_config) - return; - - phys_addr = mp_lapic_addr; - - set_fixmap_nocache(FIX_APIC_BASE, phys_addr); - apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n", - APIC_BASE, phys_addr); - - /* - * Fetch the APIC ID of the BSP in case we have a - * default configuration (or the MP table is broken). - */ - boot_cpu_physical_apicid = read_apic_id(); -} -#endif - -/** - * init_apic_mappings - initialize APIC mappings - */ -void __init init_apic_mappings(void) -{ -#ifdef HAVE_X2APIC - if (x2apic) { - boot_cpu_physical_apicid = read_apic_id(); - return; - } -#endif - - /* - * If no local APIC can be found then set up a fake all - * zeroes page to simulate the local APIC and another - * one for the IO-APIC. - */ - if (!smp_found_config && detect_init_APIC()) { - apic_phys = (unsigned long) alloc_bootmem_pages(PAGE_SIZE); - apic_phys = __pa(apic_phys); - } else - apic_phys = mp_lapic_addr; - - set_fixmap_nocache(FIX_APIC_BASE, apic_phys); - apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n", - APIC_BASE, apic_phys); - - /* - * Fetch the APIC ID of the BSP in case we have a - * default configuration (or the MP table is broken). - */ - if (boot_cpu_physical_apicid == -1U) - boot_cpu_physical_apicid = read_apic_id(); -} - -/* - * This initializes the IO-APIC and APIC hardware if this is - * a UP kernel. - */ -int apic_version[MAX_APICS]; - -int __init APIC_init_uniprocessor(void) -{ -#ifdef CONFIG_X86_64 - if (disable_apic) { - pr_info("Apic disabled\n"); - return -1; - } - if (!cpu_has_apic) { - disable_apic = 1; - pr_info("Apic disabled by BIOS\n"); - return -1; - } -#else - if (!smp_found_config && !cpu_has_apic) - return -1; - - /* - * Complain if the BIOS pretends there is one. - */ - if (!cpu_has_apic && - APIC_INTEGRATED(apic_version[boot_cpu_physical_apicid])) { - pr_err("BIOS bug, local APIC 0x%x not detected!...\n", - boot_cpu_physical_apicid); - clear_cpu_cap(&boot_cpu_data, X86_FEATURE_APIC); - return -1; - } -#endif - -#ifdef HAVE_X2APIC - enable_IR_x2apic(); -#endif -#ifdef CONFIG_X86_64 - setup_apic_routing(); -#endif - - verify_local_APIC(); - connect_bsp_APIC(); - -#ifdef CONFIG_X86_64 - apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid)); -#else - /* - * Hack: In case of kdump, after a crash, kernel might be booting - * on a cpu with non-zero lapic id. But boot_cpu_physical_apicid - * might be zero if read from MP tables. Get it from LAPIC. - */ -# ifdef CONFIG_CRASH_DUMP - boot_cpu_physical_apicid = read_apic_id(); -# endif -#endif - physid_set_mask_of_physid(boot_cpu_physical_apicid, &phys_cpu_present_map); - setup_local_APIC(); - -#ifdef CONFIG_X86_64 - /* - * Now enable IO-APICs, actually call clear_IO_APIC - * We need clear_IO_APIC before enabling vector on BP - */ - if (!skip_ioapic_setup && nr_ioapics) - enable_IO_APIC(); -#endif - -#ifdef CONFIG_X86_IO_APIC - if (!smp_found_config || skip_ioapic_setup || !nr_ioapics) -#endif - localise_nmi_watchdog(); - end_local_APIC_setup(); - -#ifdef CONFIG_X86_IO_APIC - if (smp_found_config && !skip_ioapic_setup && nr_ioapics) - setup_IO_APIC(); -# ifdef CONFIG_X86_64 - else - nr_ioapics = 0; -# endif -#endif - -#ifdef CONFIG_X86_64 - setup_boot_APIC_clock(); - check_nmi_watchdog(); -#else - setup_boot_clock(); -#endif - - return 0; -} - -/* - * Local APIC interrupts - */ - -/* - * This interrupt should _never_ happen with our APIC/SMP architecture - */ -void smp_spurious_interrupt(struct pt_regs *regs) -{ - u32 v; - - exit_idle(); - irq_enter(); - /* - * Check if this really is a spurious interrupt and ACK it - * if it is a vectored one. Just in case... - * Spurious interrupts should not be ACKed. - */ - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) - ack_APIC_irq(); - - inc_irq_stat(irq_spurious_count); - - /* see sw-dev-man vol 3, chapter 7.4.13.5 */ - pr_info("spurious APIC interrupt on CPU#%d, " - "should never happen.\n", smp_processor_id()); - irq_exit(); -} - -/* - * This interrupt should never happen with our APIC/SMP architecture - */ -void smp_error_interrupt(struct pt_regs *regs) -{ - u32 v, v1; - - exit_idle(); - irq_enter(); - /* First tickle the hardware, only then report what went on. -- REW */ - v = apic_read(APIC_ESR); - apic_write(APIC_ESR, 0); - v1 = apic_read(APIC_ESR); - ack_APIC_irq(); - atomic_inc(&irq_err_count); - - /* - * Here is what the APIC error bits mean: - * 0: Send CS error - * 1: Receive CS error - * 2: Send accept error - * 3: Receive accept error - * 4: Reserved - * 5: Send illegal vector - * 6: Received illegal vector - * 7: Illegal register address - */ - pr_debug("APIC error on CPU%d: %02x(%02x)\n", - smp_processor_id(), v , v1); - irq_exit(); -} - -/** - * connect_bsp_APIC - attach the APIC to the interrupt system - */ -void __init connect_bsp_APIC(void) -{ -#ifdef CONFIG_X86_32 - if (pic_mode) { - /* - * Do not trust the local APIC being empty at bootup. - */ - clear_local_APIC(); - /* - * PIC mode, enable APIC mode in the IMCR, i.e. connect BSP's - * local APIC to INT and NMI lines. - */ - apic_printk(APIC_VERBOSE, "leaving PIC mode, " - "enabling APIC mode.\n"); - outb(0x70, 0x22); - outb(0x01, 0x23); - } -#endif - enable_apic_mode(); -} - -/** - * disconnect_bsp_APIC - detach the APIC from the interrupt system - * @virt_wire_setup: indicates, whether virtual wire mode is selected - * - * Virtual wire mode is necessary to deliver legacy interrupts even when the - * APIC is disabled. - */ -void disconnect_bsp_APIC(int virt_wire_setup) -{ - unsigned int value; - -#ifdef CONFIG_X86_32 - if (pic_mode) { - /* - * Put the board back into PIC mode (has an effect only on - * certain older boards). Note that APIC interrupts, including - * IPIs, won't work beyond this point! The only exception are - * INIT IPIs. - */ - apic_printk(APIC_VERBOSE, "disabling APIC mode, " - "entering PIC mode.\n"); - outb(0x70, 0x22); - outb(0x00, 0x23); - return; - } -#endif - - /* Go back to Virtual Wire compatibility mode */ - - /* For the spurious interrupt use vector F, and enable it */ - value = apic_read(APIC_SPIV); - value &= ~APIC_VECTOR_MASK; - value |= APIC_SPIV_APIC_ENABLED; - value |= 0xf; - apic_write(APIC_SPIV, value); - - if (!virt_wire_setup) { - /* - * For LVT0 make it edge triggered, active high, - * external and enabled - */ - value = apic_read(APIC_LVT0); - value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING | - APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR | - APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED); - value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING; - value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT); - apic_write(APIC_LVT0, value); - } else { - /* Disable LVT0 */ - apic_write(APIC_LVT0, APIC_LVT_MASKED); - } - - /* - * For LVT1 make it edge triggered, active high, - * nmi and enabled - */ - value = apic_read(APIC_LVT1); - value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING | - APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR | - APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED); - value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING; - value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_NMI); - apic_write(APIC_LVT1, value); -} - -void __cpuinit generic_processor_info(int apicid, int version) -{ - int cpu; - - /* - * Validate version - */ - if (version == 0x0) { - pr_warning("BIOS bug, APIC version is 0 for CPU#%d! " - "fixing up to 0x10. (tell your hw vendor)\n", - version); - version = 0x10; - } - apic_version[apicid] = version; - - if (num_processors >= nr_cpu_ids) { - int max = nr_cpu_ids; - int thiscpu = max + disabled_cpus; - - pr_warning( - "ACPI: NR_CPUS/possible_cpus limit of %i reached." - " Processor %d/0x%x ignored.\n", max, thiscpu, apicid); - - disabled_cpus++; - return; - } - - num_processors++; - cpu = cpumask_next_zero(-1, cpu_present_mask); - - if (version != apic_version[boot_cpu_physical_apicid]) - WARN_ONCE(1, - "ACPI: apic version mismatch, bootcpu: %x cpu %d: %x\n", - apic_version[boot_cpu_physical_apicid], cpu, version); - - physid_set(apicid, phys_cpu_present_map); - if (apicid == boot_cpu_physical_apicid) { - /* - * x86_bios_cpu_apicid is required to have processors listed - * in same order as logical cpu numbers. Hence the first - * entry is BSP, and so on. - */ - cpu = 0; - } - if (apicid > max_physical_apicid) - max_physical_apicid = apicid; - -#ifdef CONFIG_X86_32 - /* - * Would be preferable to switch to bigsmp when CONFIG_HOTPLUG_CPU=y - * but we need to work other dependencies like SMP_SUSPEND etc - * before this can be done without some confusion. - * if (CPU_HOTPLUG_ENABLED || num_processors > 8) - * - Ashok Raj - */ - if (max_physical_apicid >= 8) { - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_INTEL: - if (!APIC_XAPIC(version)) { - def_to_bigsmp = 0; - break; - } - /* If P4 and above fall through */ - case X86_VENDOR_AMD: - def_to_bigsmp = 1; - } - } -#endif - -#if defined(CONFIG_X86_SMP) || defined(CONFIG_X86_64) - /* are we being called early in kernel startup? */ - if (early_per_cpu_ptr(x86_cpu_to_apicid)) { - u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid); - u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid); - - cpu_to_apicid[cpu] = apicid; - bios_cpu_apicid[cpu] = apicid; - } else { - per_cpu(x86_cpu_to_apicid, cpu) = apicid; - per_cpu(x86_bios_cpu_apicid, cpu) = apicid; - } -#endif - - set_cpu_possible(cpu, true); - set_cpu_present(cpu, true); -} - -#ifdef CONFIG_X86_64 -int hard_smp_processor_id(void) -{ - return read_apic_id(); -} -#endif - -/* - * Power management - */ -#ifdef CONFIG_PM - -static struct { - /* - * 'active' is true if the local APIC was enabled by us and - * not the BIOS; this signifies that we are also responsible - * for disabling it before entering apm/acpi suspend - */ - int active; - /* r/w apic fields */ - unsigned int apic_id; - unsigned int apic_taskpri; - unsigned int apic_ldr; - unsigned int apic_dfr; - unsigned int apic_spiv; - unsigned int apic_lvtt; - unsigned int apic_lvtpc; - unsigned int apic_lvt0; - unsigned int apic_lvt1; - unsigned int apic_lvterr; - unsigned int apic_tmict; - unsigned int apic_tdcr; - unsigned int apic_thmr; -} apic_pm_state; - -static int lapic_suspend(struct sys_device *dev, pm_message_t state) -{ - unsigned long flags; - int maxlvt; - - if (!apic_pm_state.active) - return 0; - - maxlvt = lapic_get_maxlvt(); - - apic_pm_state.apic_id = apic_read(APIC_ID); - apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI); - apic_pm_state.apic_ldr = apic_read(APIC_LDR); - apic_pm_state.apic_dfr = apic_read(APIC_DFR); - apic_pm_state.apic_spiv = apic_read(APIC_SPIV); - apic_pm_state.apic_lvtt = apic_read(APIC_LVTT); - if (maxlvt >= 4) - apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC); - apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0); - apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1); - apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR); - apic_pm_state.apic_tmict = apic_read(APIC_TMICT); - apic_pm_state.apic_tdcr = apic_read(APIC_TDCR); -#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL) - if (maxlvt >= 5) - apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); -#endif - - local_irq_save(flags); - disable_local_APIC(); - local_irq_restore(flags); - return 0; -} - -static int lapic_resume(struct sys_device *dev) -{ - unsigned int l, h; - unsigned long flags; - int maxlvt; - - if (!apic_pm_state.active) - return 0; - - maxlvt = lapic_get_maxlvt(); - - local_irq_save(flags); - -#ifdef HAVE_X2APIC - if (x2apic) - enable_x2apic(); - else -#endif - { - /* - * Make sure the APICBASE points to the right address - * - * FIXME! This will be wrong if we ever support suspend on - * SMP! We'll need to do this as part of the CPU restore! - */ - rdmsr(MSR_IA32_APICBASE, l, h); - l &= ~MSR_IA32_APICBASE_BASE; - l |= MSR_IA32_APICBASE_ENABLE | mp_lapic_addr; - wrmsr(MSR_IA32_APICBASE, l, h); - } - - apic_write(APIC_LVTERR, ERROR_APIC_VECTOR | APIC_LVT_MASKED); - apic_write(APIC_ID, apic_pm_state.apic_id); - apic_write(APIC_DFR, apic_pm_state.apic_dfr); - apic_write(APIC_LDR, apic_pm_state.apic_ldr); - apic_write(APIC_TASKPRI, apic_pm_state.apic_taskpri); - apic_write(APIC_SPIV, apic_pm_state.apic_spiv); - apic_write(APIC_LVT0, apic_pm_state.apic_lvt0); - apic_write(APIC_LVT1, apic_pm_state.apic_lvt1); -#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL) - if (maxlvt >= 5) - apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); -#endif - if (maxlvt >= 4) - apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc); - apic_write(APIC_LVTT, apic_pm_state.apic_lvtt); - apic_write(APIC_TDCR, apic_pm_state.apic_tdcr); - apic_write(APIC_TMICT, apic_pm_state.apic_tmict); - apic_write(APIC_ESR, 0); - apic_read(APIC_ESR); - apic_write(APIC_LVTERR, apic_pm_state.apic_lvterr); - apic_write(APIC_ESR, 0); - apic_read(APIC_ESR); - - local_irq_restore(flags); - - return 0; -} - -/* - * This device has no shutdown method - fully functioning local APICs - * are needed on every CPU up until machine_halt/restart/poweroff. - */ - -static struct sysdev_class lapic_sysclass = { - .name = "lapic", - .resume = lapic_resume, - .suspend = lapic_suspend, -}; - -static struct sys_device device_lapic = { - .id = 0, - .cls = &lapic_sysclass, -}; - -static void __cpuinit apic_pm_activate(void) -{ - apic_pm_state.active = 1; -} - -static int __init init_lapic_sysfs(void) -{ - int error; - - if (!cpu_has_apic) - return 0; - /* XXX: remove suspend/resume procs if !apic_pm_state.active? */ - - error = sysdev_class_register(&lapic_sysclass); - if (!error) - error = sysdev_register(&device_lapic); - return error; -} -device_initcall(init_lapic_sysfs); - -#else /* CONFIG_PM */ - -static void apic_pm_activate(void) { } - -#endif /* CONFIG_PM */ - -#ifdef CONFIG_X86_64 -/* - * apic_is_clustered_box() -- Check if we can expect good TSC - * - * Thus far, the major user of this is IBM's Summit2 series: - * - * Clustered boxes may have unsynced TSC problems if they are - * multi-chassis. Use available data to take a good guess. - * If in doubt, go HPET. - */ -__cpuinit int apic_is_clustered_box(void) -{ - int i, clusters, zeros; - unsigned id; - u16 *bios_cpu_apicid; - DECLARE_BITMAP(cl