Archive for the ‘Uncategorized’ Category

What is Knowledge?

December 6, 2013

Knowledge is a domain that encompasses contradictory traits; “knowing” describes knowledge of true or false statements to be absolutely true. Knowledge can be classified* in one of three classes: knowledge that I know I know. This is the knowledge that I am aware of, and do know very well. Knowledge that I know I don’t know. This is knowledge that I am aware exists, but I’m unfamiliar with it or I do not know it to be unequivocally true. Finally, the knowledge that I don’t know, I don’t know. This is the largest class of knowledge, and it is large simply because I cannot be an expert on every topic, and what I know spans a small percentage of it. So, I am going to claim that no one can have absolute knowledge, instead, we can only have relative knowledge that can be viewed as subjective or objective.

I hypothesize that there should always be knowledge that I don’t know I don’t know, and that is because if I believe that I know of the existence of all knowledge (knowledge I know I know, and knowledge I know I don’t know), then that means that there is nothing that exists beyond my knowledge or awareness. Since this claim doesn’t contradict the fact that there exists at least one thing that I don’t know that I don’t know, both claims can coexist at the same time. This means that the latter claim can possibly be true and cannot be proven untrue unless there is a source of absolute knowledge that can be used as a reference to my knowledge level. Thus, there is always the possibility that something exists that I don’t know I don’t know, and that cannot be proven wrong unless all knowledge is finite. However, if knowledge is finite, that means there exists a point in knowledge learning that I can reach where there is absolutely no chance that I do not know what statements I do not know. And since I’ve already shown above that I cannot verify the existence of things that I do not know I do not know, there is no way to prove the claim that knowledge is finite. Thus, knowledge is infinite because no matter how much I learn, the claim that there exists at least one thing that I do not know I do not know is still valid, albeit and due to it being intrinsically unverifiable.

Another argument to prove that no one can have absolute knowledge is that knowledge can be many versions of the truth. Different believers in different gods would have different knowledge versions of what a god is. If I possess absolute knowledge, then I must “know” all the different versions, which means that I have to possess all possible opinions of any topic, and “know” them to be true individually. This contradicts being opinionated or knowledgeable on a topic when you possess all possible opinions. This last statement can be simply proven by stating that such an opinion (of all opinions) can be a form of one opinion, with which there exists at least an opinion that disagrees with it (all opinions that “know” one version would implicitly disagree with all other versions), which forms one more opinion that is not included in this universal opinion, unless the universal opinion also includes opinions that do not agree with itself! Which shows self-admission of a weakly formed opinion, which doesn’t rise up to the strength of knowing something absolutely.

What is knowledge? Knowledge is “knowing” something absolutely. It could be viewed as subjective as it can be polluted with one’s opinion, although to that person, her opinion is truth. “Knowing” one’s god doesn’t mean that that specific god exists. I can “know” anything I want because it doesn’t have to be absolutely true, although to me specifically (in my opinion), it may be. It is easy to prove that knowledge is not always true, by looking at all religions and how they cannot all be true, especially given the contradiction among different aspects of those religions. Furthermore, if knowledge was always true, then knowledge does not exist today because I won’t know for sure if what scientists discovered today would remain standing (remain true) in future discoveries and I won’t know for sure that my understandings of the nature of things around me would hold. Unless I can prove that, knowledge does not exist until the definition of knowledge is relaxed and it can encompass true and false statements as well. Knowledge has to include both true and false understandings of a phenomena. For example, “knowing” Jesus Christ was resurrected, to be true for one person such as a Christian and to be false by a Muslim. Thus, “knowing” that Jesus Christ was resurrected can be simultaneously true and false in the domain of knowledge.

Also, another argument that knowledge doesn’t exist if it has to mean absolute correctness or trueness (invariant across people), is that my knowing of my god, and yours of your god mean that neither of us possess any knowledge, given that there are two versions of it and both cannot be true at the same time. So, knowledge can be false and subjective, and can have many versions of varying truthfulness.

How is knowledge different than opinion? Belief? Understanding? An opinion is a formulation or a result of a collection of knowledge elements or facts. A person formulates an opinion after acquiring knowledge of a certain subject. An opinion leads to an established or fixed belief that is no longer subjective to deviation. Understanding of something defines a weaker knowledge in the topic. “Knowing” is a most powerful understanding of something.

Knowledge is relative and subjective. In Plato’s cave, one learns a lot about other objects around him by observing. Sometimes what you observe is a minimal representation of the actual object or its original behavior. Maybe I just see shadows of the real object, and the object although fairly represented by its shadow, is significantly different in reality. My knowing of the object may converge or even diverge as more characteristics of the object are revealed. Furthermore, since the learning experience cannot be provenly bounded, there is no way to determine its ceiling, knowledge can only mutate and never be fixed. The moment knowledge is fixed for all topics, the moment we solve all problems in the world, observable and unobservable, and reach absolute knowledge.

Is there such a thing as absolute knowledge? Can it be attained by anyone? Can it be described? What is absolute knowledge? I know what knowledge is (defined above). And I know that knowledge does not always have to be true, correct, or complete. And I know absolute knowledge to be “knowing” all versions of everything. If all knowledge can be quantified and identified, then it must also be assumed attainable, which means that all knowledge has to be finite. I have already proven that knowledge is not finite and can never be proven to be finite due to the need of a proof to address the following contradictory statements:

1. All knowledge can be reduced to knowledge that I know I know, and knowledge that I know I don’t know after some learning.
2. There exists at least one thing that I do not know I do not know.

Thus knowledge is infinite since the first statement above cannot be proven given the second statement being unprovable since we cannot prove that all the things that we are not aware exist, do not actually exist. This implies that knowledge is infinite, and cannot be fully attained, and our knowledge is relative, and can and will encompass absolutely true or false things.

* The three classifications mentioned above are borrowed from a presenter at the No Fluff Just Stuff IT conference in 2009.


Summary of Instruction Level Parallelism Limitation

December 5, 2012

Limitation of ILP

1. Code is not mostly parallelizable. Upper limit on parallelization according to Amdahl’s law. ILP speedup may need to be in the 5 to 20 factor to be accepted generally given the VLIW processor complexity that needs to be introduced to achieve that [A]. During simulations, even with increased hardware available to processor, programs did not achieve a corresponding linear increase in performance by exploiting ILP [C].

2. Data dependency. Either stall and waste resources (and have a simple hardware), or:
2.1. Data speculation [B] – complexity of handling predictions (serialized and non-serialized methods), handling spurious exceptions, memory disambiguation/aliasing, and misprediction undoing penalty.
2.2. Forwarding – does not completely avoid stalling, and increases hardware complexity.
2.3. Instruction reordering (compiler and/or hardware) [C]. This technique is used to remove data dependency mostly originated by waiting for a memory operation to finish (memory operations account for 30-40% of total code [C]). This approach introduces hardware complexity such as reorder buffers to allow out-of-order execution but in-order retirement of instructions [C]. Memory delays could also be shortened through the use of I/D caches. However, those present their own sets of challenges and limitations (limited size, enforcing data coherency across processor caches, managing efficient cache replacement policies overhead, etc.) Generally speaking, there are many ways to accomplish instruction re-ordering:
2.3.1. Compilers can tag instructions with dependency flags (all instructions this current instruction is dependent on) such as in dependence processors (Dataflow). This can also be accomplished by the processor itself without the help of the compiler such as in sequential (superscalar) processors (although they may also use suggestions from the compiler but will not guarantee using those suggestions) [A].
2.3.2. Compilers tag how many instructions within the last M instructions that this current instruction is dependent on (such as in Horizon processor) [A].
2.3.3. Compilers can group instructions together in traces (this will include moving and duplicating instructions in various basic blocks of execution to allow executing (early) instructions as part of a basic block that would definitely be executed regardless of control decisions between. More code size, but higher performance overall.
2.3.4. Compilers can use register renaming and loop unrolling to remove data dependency across iterations, and speed up execution (sending separate loops to execute in parallel), this is referred to as software pipelining [A]. This adds a trade-off between how many loops to get higher throughput versus increasing code size (many loops) when some of those loops may end up unnecessary (iterations end earlier than unrolled code). Software pipelining goes beyond loop unrolling, it also brings code not dependent on looping outside of the loop (some to the top [prologue] and some to the bottom [epilogue] of the middle which is truly iterable code [kernel]), and then the true iterable code will be re-rolled. This type of compiler scheduling is called Modulo scheduling [A]. This approach can also cause register spilling (more registers needed than absolutely necessary for program execution), condition prediction (we need to speculate on whether the loop would be executed at least one more time to unroll – static prediction), true dependencies in data used in iteration i after being written in iteration < i, memory aliasing issues (will some pointers in one iteration write to the same address as in subsequent iterations?), etc.

3. Control dependency. Either stall and waste resources, or:
3.1. Branch prediction – complexity of handling predictions, handling spurious exceptions, and misprediction reversal (reservation station, instruction caches, and instruction commits are used to allow this).
3.2. Simultaneous branch execution (branch fanout) – more functional units, more registers, waste of resources that otherwise could have been used somewhere else, more management of what gets committed, and what gets to be discarded, etc.
3.3. Compilers and hardware working together to add more delay slot instructions, reordering of instructions.

4. Optimizing and parallelizing programs depend on knowing which basic blocks (or portions of the code) would run more frequently than others – execution frequency of basic blocks [A]. This can be achieved via analysis of the control flow graph of the code through the use of profilers. This is not a straightforward process, and it is highly dependent on the workload at runtime.

5. Basic block ILP limitation [A] can be mostly attributed to (aside from data dependency already mentioned above) limitations of functional hardware units due to similar operations within the block (albeit independent). For example, if the processor has 5 ALU units available to execute 5 adds in parallel (same cycle), but a basic block has 6 independent adds, then we need two cycles to execute instead of one. That is why VLIW instructions will include differing operations that can execute in the same cycle rather than many of the same operation (dependent on what is available in the hardware). Furthermore, ILP could be limited to 3-4 parallelizable instructions out of around 10 in a basic block (high limit) [F]. This is just a limitation of parallelism in the code itself.

6. Naming dependency (except for true dependencies). This can be resolved with register [C] and memory (except for ones with alias problem potential) renaming which is usually done by the compiler [B]. It is still limited by register availability and management (besides potentially introducing register spilling issues, we may also run into bandwidth issues in register files due to more reads and writes on the expanded list of registers).

7. A big reason why code is hard to parallelize is address calculations [D]. Eliminating or reducing long dependency chains of address calculations via compiler optimized code is seen to increase ILP. The paper [D] refers to some techniques in other resources to address addressing computation optimization, so I will have to do more reading.

8. Most of the data dependency that limits ILP comes from true dependencies (read after write dependencies) because the other two types (anti-dependencies [write after read] and output dependencies [write after write]) can be managed with renaming [E]. Those true dependencies come primarily from compiler-induced optimizations to support the high level language abstractions [E]. The compiler will introduce a high usage of the stack to reflect activation records corresponding to function calls and private variables, introducing a true dependency that would significantly reduce parallelization [E]. The true limitation is not based on the time it takes to allocate or deallocate the stack pointer register for an activation record, but based on the long chain of dependencies introduced [E]. [E] shows that even if all dependencies were completely eliminated, leaving only those that update the stack pointer, the total performance gained is nearly unchanged (controlling the absolute limit for achieving ILP). Removing stack update dependencies, however, have shown to provide significant performance gains even compared to perfect prediction and renaming – use heap based activation record allocation instead of stack allocation (accept higher overhead of allocating to enable multi-threading and true parallel execution of program traces). Other suggestions may include the use of multiple stacks or switching between stack-based versus heap-based at compile time based on the depth of the stack calling chain (the deeper the call stack, the more benefit gained from using heap-based activation record allocation) [E]. Some papers show that by increasing the window size, more parallelism can be exploited, while [E] shows that while that may be true, “distant” dependencies (beyond the window size) cannot be exploited with out-of-order instruction issue by superscalars, and other methods are needed under a reasonable window size limitation.

Things to look for or read about

1. Window size adjustments. How about compiler-controlled variable sizes?

2. Perfecting branch predictors, very important since it has a major impact on exploiting ILP. Most papers I read have done simulations under unreasonable assumptions such as perfect prediction, unlimited resources, unlimited bandwidth, unreasonably low penalty cost for mispredictions, ignoring spurious exceptions, etc.

3. Handling memory aliases at compiler time.

4. Choosing stack based or heap based activation record allocation at compile time. Maybe even considering multiple stacks – addresses true dependencies introduced by the compiler via deep chain of function call dependencies. A major performance increase can be gained here.

5. Clock rate per operation variation to increase throughput for faster operations. This can potentially increase a low ceiling on potential throughput even for embarrassingly parallel operations on the same CPU.

6. Generation of per-thread-based traces by the compiler, taking into account shared versus dedicated on chip memory, possible proximity to shared caches, etc.

7. Can traces be concurrent rather than parallel? Allowing for concurrent execution rather than parallel execution (allowing values to forward or be written to shared caches rather than waiting for a complete trace to finish before another one to start even on separate cores).

8. Maybe enforce convention by the compiler to allow predictable address fanout (range of memory address) for given functions or programs. For example, for dynamically allocated objects, the compiler may enforce via hints to the hardware how far apart they need to be on the heap, which will allow the hardware to take advantage of locality when loading a cache line from memory. Those can only be hints due to memory allocation and page replacement strategies, but a cooperation from the hardware and hints from the software can increase this utilization.

9. Exploit the nature of sequence analysis algorithms to optimize performance.

10. A hybrid processor approach to realize ILP and DLP (combining VLIW/superscalar and vector processors).


A. Instruction-Level Parallel Processing. Joseph A. Fisher, B. Ramakrashna Rau.
B. Limits of Instruction Level Parallelism with Data Value Speculation. Jose Gonzalez and Antonio Gonzalez.
C. Exploiting Instruction- and Data-Level Parallelism. Roger Espasa and Mateo Valero.
D. Limits and Graph Structure of Available Instruction-Level Parallelism. Darko Stevanovic and Maragaret Martonosi.
E. The Limits of Instruction Level Parallelism in Spec95 Applications. Matthew A. Postiff, David A. Greene, Gary S. Tyson, and Trevor N. Mudge.
F. Limits of Instruction-Level Parallelism. David W. Wall.